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Q I Zipf 's law states that if words of language are ranked in the order of decreasing 

frequency in texts, the frequency of a word is inversely proportional to its rank. It is very 
robust as an experimental observation, but to date it escaped satisfactory theoretical 
explanation. We suggest that Zipf 's law may arise from the evolution of word semantics 
dominated by expansion of meanings and competition of synonyms. 
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Introduction 



^ ■ Zipf's law may be one of the most enigmatic and controversial regularities known in 

IjH ! linguistics. It has been alternatively billed as the hallmark of complex systems and 

dismissed as a mere artifact of data presentation. Simplicity of its formulation, exper- 
Q ■ imental universality and robustness starkly contrast with obscurity of its meaning. In 

^ ', its most straightforward form [1], it states that if words of a language are ranked in the 

order of decreasing frequency in texts, the frequency is inversely proportional to the 
' rank, 

fk oc k-^ (1) 



where fk is the frequency of the word with rank k. As an example. Fig. 1 is a log-log 
plot of frequency vs. rank for a frequency dictionary of Russian language [2], [3]. The 
5^ I dictionary is based on a corpus of 40 million words, with special care taken to prevent 

data skewing by words with high concentration in particular texts (like the word hobbit 
in a Tolkien sequel). 

Zipf's law is usually presented in a generalized form where the power law exponent 
may be different from —1, 

fk oc A:-^. (2) 

Equivalently, it can be represented as a statement about the distribution function of 
words according to their frequency, 

P(/)oc/-^/3 = i? + l, (3) 



"Prepublication draft. Submitted to Cognitive Science. 
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Figure 1: Zipf's law for Russian language 




where P{f)df represents the fraction of words with frequencies in [/, f + df]. 

According to [4|, where an extensive bibliography is presented, various subsets of 
the language obey the generalized Zipf's law ([2|). Thus, while the value oi B 1 
is typical for single author samples, different values, both greater and less than 1, 
characterize speech of schizophrenics and very young children, military communications, 
or subsamples consisting of nouns only. 

Here we concentrate on the whole language case and do not consider these varia- 
tions. Neither do we attempt to generalize our treatment to include other power law 
probability distributions, which are ubiquitous in natural and artificial phenomena of 
various nature. The purpose of this work is to demonstrate that the inverse proportion- 
ality ([I]) can be explained on purely linguistic grounds. Likewise, we don't pay special 
attention to the systematic deviations from the inverse proportionality at the low-rank 
and high-rank ends. 

It is not possible to review the vast literature related to the Zipf's law. However 
it appears that the bulk of it is devoted to experimental results and phenomenological 
models. Models that would aim at explaining the underlying cause of the power law 
and predicting the exponent are not overabundant. We review models of this type in 
the first section. In section 2, we discuss the role in the language of words/meanings 
having different degrees of generality. In section 3, we show that Zipf's law can be 
generated by some particular arrangements of word meanings over the semantic space. 
In Section 4, we discuss the evolution of word meanings and demonstrate that it can 
lead to such arrangements. Section 5 is devoted to numerical modeling of this process. 
Discussion and prospects for further studies constitute section 6. In Appendix A, 
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Mandelbrot's optimization model is considered in detail, and in Appendix B we discuss 
proportionality of word frequency to the extent of its meaning. 

1 Some previous models 

Statistical models of Mandelbrot and Simon 

The two most well-known models for Zipf 's law in the linguistic domain are due to two 
prominent figures in the 20th-century science: Benoit Mandelbrot, of the fractals fame, 
and Herbert A. Simon, who is listed among the founding fathers of AI and complex 
systems theorjjl]. 

The simplest possible model exhibiting Zipfian distribution is due to Mandelbrot 
[5] and is widely known as random typing or intermittent silence model. It is just a 
generator of random character sequences where each symbol of an arbitrary alphabet 
has the same constant probability and one of the symbols is arbitrarily designated as a 
word-delimiting "space". The reason why "words" in such a sequence have a power-law 
frequency distribution is very simple as noted by Li |6]. Indeed, the number of possible 
words of a given length is exponential in length (since all characters are equiprobable) , 
and the probability of any given word is also exponential in its length. Hence, the 
dependency of each word's frequency on its frequency rank is asymptotically given by 
a power law. In fact, the characters needn't even be equiprobable for this result to 
hold [6]. Moreover, a theorem due to Shannon [7] (Theorem 3 there) suggests that even 
the condition of independence between characters can be relaxed and replaced with 
ergodicity of the source. 

Based on this observation, it is commonly held that Zipf's law is "linguistically 
shallow" (Mandelbrot [8]) and does not reveal anything interesting about the natural 
language. However it is easy to show that this conclusion is at least premature. The 
random typing model itself is undoubtedly "shallow", but it cannot be related to the 
natural language for the very simple reason that the number of distinct words of the 
same length in the real language is far from being exponential in length. In fact, it is 
not even monotonic as can be seen in Fig. [2l where this distribution is calculated from 
a frequency dictionary of the Russian language p| and from Leo Tolstoy's novel "War 
and Peace". (It also doesn't matter that the frequency dictionary counts multiple word 
forms as one word, while with "War and Peace" we counted them as distinct words.) 
Thus, even if Zipf's law in natural language is indeed uninteresting, the random typing 
model can not prove this. 

Taking a more general view, we observe that Zipf's law is created here by a simple 
stochastic process. But human speech is emphatically not a simple stochastic process. 
It is a highly structured phenomenon, driven by extralinguistic needs and stimuli and 
eventually used for communication of sentient beings in a real world. If emergence of 
Zipf's law may not be surprising in simple models, this doesn't make it less surprising 
in such an immensely complex process as speech. Why should words freely chosen by 

^As a historical aside, it is interesting to mention that Simon and Mandelbrot have exchanged rather 
spectacularly sharp criticisms of each other's models in a series of letters in the journal Information and 
Control in 1959-1961. 
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people to communicate information, images and emotions, be subject to such a strict 
probability distribution? 

Another purely statistical model for Zipf's law applicable in various domains, in- 
cluding language, was proposed by Simon [9], ^0\. It is based on a much earlier work 
by Yule [H] who introduced his model in the context of evolutionary biology (distribu- 
tion of species among genera) as early as 1925. Currently, this and related models are 
known as preferential attachment or cumulative advantage models, since they describe 
processes where the growth rate of an object is proportional to its current size. 

In the linguistic domain, this model in its simplest form describes writing of a 
continuous text as a process where the next word tokerU is selected with a constant 
probability p to be a new, never before encountered word, and with probability (l—p) to 
be a copy of one of the previous word tokens (any one, with equal probabilities). In this 
form, the model is not realistic, since it is well-known that instances of an infrequent 
word are not distributed evenly in texts, as the model would predict, but tend to occur 
in clusters. However, the model can be significantly relaxed. Namely, define n-word 
as a word that has occurred exactly n times in the preceding text. Suppose that the 
probability for the next word in the text to be an (any) n-word is equal to the fraction 
of all n-word tokens in the preceding sequence. Simon showed that this process still 
leads to the Zipfian distribution. The model can be further extended to account for 
words dropping out of use in such a way as to preserve the frequency distribution. 

In the latter form, Simon's model is compatible with word clustering. But is it 
applicable to the natural language? It is not quite straightforward to verify the as- 
sumptions on which the model is based. In our calculations using Tolstoy's "War and 
Peace" (about half a million words in Russian), which we don't report in detail here, 

^When the same word occurs multiple times in a sequence, we will speak of word tokens, occurrences, or 
instances. 
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it appears that the assumption of the constant rate of new word introduction does not 
hold. Rather, new words are introduced at a rate that decays approximately as A^"*^'^, 
where is the sequence number of words in text. As for the probability that the next 
word is one of n-words, it is more or less consistent with the model, except for the most 
and the least frequent words. It is not clear though how critical these departures are 
for the model. 

Simon also argued that the model could be applicable to the language as a whole, 
where the birth/death rate describes introduction of neologisms and words becoming 
obsolete, while the probability assumption describes word usage. 

It seems though that the model's expanatory power is not sufficient. Even if it is 
correct, we are still left with the question of why it is correct. Simon's argument goes 
approximately as follows. Suppose the next word choice is described by the probability 
Pnk = Pntnk, where Pnk is the probability that the fc-th of the n-words will be selected, 
Pn is the fraction of all n-word tokens in the preceding text, and tnk describes a "topic" 
factor, which favors words appropriate to the topic currently discussed in the text. It 
is sufficient to require that J2k ^nk = 1 for all n for the model to work. Thus the model 
can even incorporate the idea that people select words according to a topic rather than 
randomly. But why would the last equality hold? That is, why should the selection of 
some (topical) n-words be at the expense of other n-words, and not at the expense of 
some m-words with n 7^ ml 

More significantly, Simon's model seems to imply that the very fact of some words 
being frequent and others infrequent is a pure game of chance. But in reality, most rare 
words are rare just because they are rarely needed. Finally, it is not an idle question 
why do we need words with vastly different frequencies at all. Wouldn't it be more 
efficient for all words to have about the same frequency? Simon's model doesn't begin 
to answer these questions. 

Guiraud's semic matrices 

A radically different approach was taken by the French linguist Pierre GuiraucH. He 
suggested that Zipf 's law "would be produced by the structure of the signified, but would 
be reflected by that of the signifier" [12]. Specifically, suppose that all word meanings 
can be represented as superpositions of a small number of elementary meanings, or 
semes. In keeping with the structuralist paradigm, each seme is a binary opposition, 
such as animate/inanimate or actor/process (Guiraud's examples). Each seme can 
be positive, negative or unmarked in any given word. Assuming that the semes are 
orthogonal, so that seme values can be combined with each other without constraints, 
with A^ semes, there can be 2N single-seme words (i.e. words where only one seme 
is marked), AN{N — 1) two-seme words, and so on. The number of words increases 
roughly exponentially with the number of marked semes. On the other hand, assume 
that all semes have the same probability to come up in a speech situation. Then the 
probability of a word with m marked semes is also exponential in m. This leads to 
Zipf 's distribution for words. 

From the formal point of view, the genesis of Zipf's distribution here is strikingly 

am grateful to J.D.Apresjan who drew my attention to Guiraud's works. 



5 



similar to that in the random typing model. In both cases, the number of words and the 
probability of a word are both exponential in some parameter (the number of marked 
semes or the number of letters respectively). Indeed, by Guiraud's account in |12| . 
Mandelbrot initially formulated his model in terms of some hypothetical mental coding 
units, and only later reformulated it in terms of letters. In Guiraud's model these 
coding units turn out to be the semes. 

This model is very attractive conceptually and heuristically, since it explains word 
frequencies as resulting from the language's function as a vehicle for meaning trans- 
fer. However it is too rigid and schematized to be realistic. It seems very unlikely 
that the meaning of any word can be decomposed into an unordered list of about 16 
(Guiraud's estimate) binary oppositions, even though theoretically that would suffice 
to form enough entries for a typical dictionary. In addition, the model crucially de- 
pends on the assumption that any combination of semes should be admissible, but even 
Guiraud's own examples show that it would be very hard to satisfy this requirement. In- 
deed, if actor/process seme is present with the value of process, then animate/inanimate 
has to be unmarked: there are no animate or inanimate verbs. (Some verbs, such as 
laugh imply the animateness of the actor, but that's a different trait. The point is that 
there is no verb that would differ from laugh only in that it's inanimate - and that 
undermines the notion of unrestricted combinability of semes.) In addition, it doesn't 
offer any diachronic perspective. 

Models based on optimality principles 

Different authors proposed models based on the observation that Zipf 's law maximizes 
some quantity. If this quantity can be interpreted as a measure of "efficiency" in some 
sense, then such model can claim explanatory power. 

Zipf himself surmised in pj that this distribution may be a result of "effort mini- 
mization" on the part of both speaker and listener. This argument goes approximately 
as follows: the broadeiEl the meaning of a word, the more common it is, because it is 
usable in more situations. More common words are more accessible in memory, so their 
use minimizes speaker's effort. On the other hand, they increase the listener's effort, 
because they require extra work on disambiguation of diffuse meanings. As a result of 
a compromise between speaker and listener, a distribution emerges. 

Zipf did not construct any quantitative model based on these ideas. The first model 
of this sort was proposed by Mandelbrot [13]. It optimizes the cost of speech production 
per bit of information transferred. Let the cost of producing word Wk be C^. The word's 
information content, or entropy, is related to its frequency pk as Hk = —log2Pk- The 
average cost per word is given by C = J2kPkCk and the average entropy per word 
hy H = —J2kPk^og2Pk- One can now ask what frequency distribution {pk} satisfying 
J2k Pk = ^ will minimize the ratio C / H. An easy calculation using Lagrange multipliers 
leads to 

Pk = Ae-^^-l^, (4) 

''We will use hroad or generic on the one hand and narrow or specific on the other to characterize the 
extent or scope of a word's meaning. 
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where A is the normahzation factor which needs to be chosen so that all the probabilities 
sum up to 1. In order to obtain a power law, the cost Ck needs to be logarithmic in 
k, Ck oc log A:. Mandelbrot derived this formula assuming that the cost of a word is 
proportional to its length, and the number of different words of length I is exponential 
in /. Then, the result becomes almost trivial, since it's well known that maximum 
information per letter is achieved by a random sequence of letters, and we return to 
the random typing model. To cite Mandelbrot [5], "These variants are fully equivalent 
mathematically, but they appeal to [...] different intuitions [...]". 

As we mentioned above, the assumption that the number of words is exponential in 
word length is incorrect (Fig. [2]). However there is a different and much more plausible 
argument for the direct relationship between cost and rank: log2 k is the number of bits 
that need to be specified in order to retrieve the fc-th word from memory (if words are 
stored in the order of decreasing frequency, which is a natural assumption), and thus 
a good candidate for a cost estimate. We leave the detailed treatment of this case for 
Appendix A, because it is not essential for the main argument here. 

But once an optimization model is constructed, it is neccessary to demonstrate 
that the global optimum can actually be achieved via some local dynamics which is 
causal and not teleological. Thus, the famous principle of least action in mechanics 
is equivalent to the local force-driven Newtonian dynamics. In the same way, a soap 
film on a wire frame achieves the global minimum of surface area via local dynamics of 
infinitesimal surface elements shifting and stretching under each other's tug. Just like 
surface elements do not "know" anything about the total area of the film, individual 
words do not "know" anything about the average information/cost ratio. 

Interestingly, in the case of Mandelbrot's optimizing model, such a local dynamics 
can be proposed. Namely, suppose that if speakers notice that a word's individual 
information/cost ratio is below average (the word has faded), they start using it less, 
and conversly, if the ratio is favorable, the word's frequency increases. It turns out that 
this local dynamics indeed leads to an establishment of a stable power-law distribution 
of word frequencies (see Appendix A for details). 

Even in this form, Mandelbrot's model has two problems. First, the power law 
exponent turns out to be very sensitive to the details of the cost function C^. This lack 
of robustness is significant, because the pure logarithmic form of cost function is just 
a very rough approximation. The second problem is that the local dynamics described 
above as the mechanism for a real language to achieve the optimum cost ratio, is not 
realistic. People will not start using a word like, say, table more frequently just because 
it happens to have a favorable cost ratio. They will use it when they need to refer to 
(anything that can be called) a table — no more, no lesfl. And a compelling explanation 
of Zipf 's law has to comply with this reality. 

A different model was proposed by Arapov & Shrejder jllj. They demonstrated 
that Zipfian distribution maximizes a quantity they call dissymmetry, which is the sum 

^To be fair, somehting similar does occur in languages when so-called expressive synonyms change to 
regular words. A well-known example is Russian sAaa, 'eye', which initially meant 'pebble', then became 
expressive for 'eye', and gradually displaced the original word for 'eye', oko of Indo-European descent. 
Another example is provided by French tete, 'head' below. But this is a different kind of dynamics involving 
competition of two words. It will be considered below. 
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of two entropies: ^ = H + H* , where H is the standard entropy that measures the 
number of different texts that can be constructed from a given set of word tokens (some 
of which are identical), while H* measures the number of ways the same text can be 
constructed from these tokens by permutations of identical word tokens. The former 
quantity is maximized when all word tokens in a text are different, the latter one when 
they are all the same, and the Zipfian distribution with its steep initial decline and long 
tail provides the best compromise. This theoretical construct does possess a certain 
pleasing symmetry, but its physical meaning is rather obscure, though the authors 
claim that $ should be maximized in "complex systems of natural origin". 

Balasubrahmanyan and Naranan [15] take a similar approach. They too, aim to 
demonstrate that the language is a "complex adaptive system", and that Zipf 's law is 
achieved in the state of maximum "complexity". Their derivation also involves defining 
and combining different entropies, some of which are related to the permutation of 
identical word tokens in the text. Both approaches of [14] and [15], in our view, have 
the same two problems. First, the quantity being optimized is not compellingly shown 
to be meaningful. Second, no mechanism is proposed to explain why and how the 
language could evolve towards the maximum. To quote |15j . 

As a general principle, an extremum is the most stable configuration and 
systems evolve to reach that state. We do not however understand the 
details of the dynamics involved. 

In a recent series of articles by Ferrer i Cancho with coauthors (see [16], [17] and 
references therein) the optimization idea is taken closer to the reality. Ferrer i Cancho's 
(hereafter FiC) models significantly differ from the other models in that they are based 
on the idea that the purpose of language is communication, and that it is optimized for 
the efficiency of communication. FiC models postulate a finite set of words and a finite 
set of objects or stimuli with a many-to-many mapping between the two. Multiple 
objects may be linked to the same word because of polysemy, while multiple words 
may be linked to the same object because of synonymy. Both polysemy and synonymy 
are, indeed, common features of natural languages. It is assumed that the frequency of 
a word is proportional to the number of objects it is linked to. Next, FiC introduces 
optimality principles and, in some cases, constraints, with the meaning of coder's effort, 
decoder's effort, mutual entropy between words and objects, entropy of signals, and so 
on. By maximizing goal functions constructed from combinations of these quantities, 
FiC demonstrated the emergence of Zip's law in phase transition-like situations with 
finely tuned parameters. 

The treatment in the present work, although quite different in spirit, shares two basic 
principles with FiC's models and, in a way, with Guiraud's ideas. First, we also consider 
it essential that language is used for communication and adopt the mapping metaphor of 
meaning (although at the early stages of language evolution, control of behavior rather 
than communication may have been its primary function — see e.g. [18]). Second, we 
postulate that word frequency is proportional to the extent, broadness, or generality 
of its meaning (see below for a more detailed discussion). But we also differ from FiC 
and Zipf in a couple of important aspects. We do not assume any optimality principles 
and neither do we use the notion of least effort. Instead, we show that Zip's law can 
be obtained as a consequence of a purely linguistic notion of avoidance of excessive 
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synonymy. It should be noted that our approach need not be mutually exclusive with 
that of FiC. In fact, they may turn out to be complementary. It may also be compatible 
with (but providing a deeper explanation than) Simon's model. 

If one is to claim that word frequency in texts is related to some properties of its 
meaning, a theory of meaning must be presented upfront. Fortunately, it doesn't have 
to be comprehensive, rather we'll outline a minimal theory that only deals with the 
single aspect of meaning that we are concerned with here: its extent. 

2 Synonymy, polysemy, semantic space 

The nature of meaning has long been the subject of profound philosophical discourse. 
What meaning is and how meanings are connected to words and statements is not at 
all a settled question. But whatever meaning is, we can operate the notion of "the 
set of all meanings", or "semantic space", because this doesn't introduce any significant 
assumptions about the nature of meaning (except, maybe, its relative stability). Of 
course, we should exercise extreme caution to avoid assuming any structure on this set 
which we don't absolutely need. For example, it would be unwise to think of semantic 
space as a Euclidean space with a certain dimensionality (as is the case with Guiraud's 
semic matrices). One could justify the assumption of a metric on semantic space, 
because we commonly talk about meanings being more or less close to each other, 
effectively assigning a distance to a pair of meanings. However as we won't need it for 
the purposes of this work, metric will not be assumed. 

In fact, the only additional structure that we do assume on semantic space S, is a 
measure. Mathematically, measure on S assigns a non-negative "volume" to subsets of 
S, such that the volume of a union of two disjoint subsets is the sum of their volume^. 
We need measure so that we can speak of words being more "specific" or "generic" in 
their meanings. If a word w has a meaning m{w) C S, then "degree of generality", 
or "extent", or "broadness" of its meaning is the measure ^{m{w)), i.e. the amount of 
ground that the word covers in semantic spacelll Note that measure does not imply 
metric: thus, there is a natural measure on the unordered set of letters of Latin alphabet 
("volume" of a subset is the number of letters in it), but to define metric, i.e. to be able 
to say that the distance between a and h is, say, 1, we need to somehow order the 
letters. 

We understand "meaning" in a very broad sense of the word. We are willing to 
say that any word has meaning. Even words like the and and have meanings: that 
of definiteness and that of combining respectively. We also want to be able to say 
that such words as together, joint, couple, fastener have meanings that are subsets of 
the meaning of and. By that we mean that in any situation where joint comes up, 

^Many subtleties are omitted here, such as the fact that a measurable set may have non-measurable 
subsets. 

^We assume that meanings of words correspond to subsets of S. It may seem natural to model them 
instead with fuzzy subsets of S, or, which is the same, with probability distributions on S. However the 
author feels that there is already enough fuzziness in this treatment, so we won't develop this possibility. 
Meanings may also be considered as prototypes, i.e. attractors in semantic space, but our model can be 
adapted to this view as well. 
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and also comes up, though maybe implicitly (whatever that means). We do not make 
distinction between connotation and denotation, intension and extension, etc. This 
means that "semantic space" S may include elements of very different nature, such as 
the notion of a mammal, the emotion of love, the feeling of warmth, and your cat 
Fluffy. Such eclecticism shouldn't be a reason for concern, since words are in fact used 
to express and refer to all these kinds of entities and many more. 

We only deal with isolated words here, without getting into how the meaning of 
this dog results from the meanings of this and of dog. Whether it is simply a set 
theoretic intersection of thisness and dogness or something more complicated, we don't 
venture to theorize. The biggest problem here is probably that the semantic space itself 
is not static, new meanings are created all the time as a result of human innovation 
in the world of objects, as well as in the world of ideas: poets and mathematicians 
are especially indefatigable producers of new meaning^. However, when dealing with 
individual words, as is the case with Zipf's law, one can ignore this instability, since 
words and their meanings are much more conservative, and only a small fraction of new 
meanings created by the alchemy of poetry and mathematics eventually claim words 
for themselves. 

Note that up to now we didn't have to introduce any structure on S, not even 
measure. Even the cardinality of S is not specified, it could be finite, countable or 
continuous. But we do need measure for the next step, when we assume that the 
frequency of the word w is proportional to the extent of its meaning, i.e. to the measure 
fi{m{w)). The more generic the meaning, the more frequent the word, and vice versa, 
the more specific the meaning, the less frequent the word. 

We don't have data to directly support this assumption, mostly because we don't 
know how to independently measure the extent of a word's meaning. One could think of 
ways to do this, such as the length of the word's dictionary definition or the number of 
all hyponyms of the given word (for instance, using WordNel^). It would be interesting 
to see if word frequency is correlated to such measures, but we are not aware of any 
research of this kind. The assumption itself however appears to be rather natural, and 
in Appendix B we provide some experimental evidence to support it. 

It is essential for this hypothesis that we do not reduce meaning to denotation, 
but include connotation, stylistical characteristics, etc. It is easy to see that the word 
frequency can't be proportional to the extent of its denotation alone: the word dog is 
more frequent that words mammal and quadruped, though its denotation (excluding 
figurative senses though) is a strict subset and thus more narro\J^. But the frequency 
of the word mammal is severely limited by its being a scientific term, i.e. its meaning 
extent is wider along the denotation axis, but narrower along the stylistic axis ("along 
the axis" should be understood metaphorically here, rather than technically). In the 
realm of scientific literature, where the stylistic difference is neutralized, mammal is 
quite probably more frequent than dog. 

It's interesting to note in this connection that according to the frequency dictionary 

^For a much deeper discussion see [19]. In particular, it turns out that the rich paraphrasing capacity of 
language may paradoxically be an evidence of high referential efficiency, 
^http: / /wordnet. princeton.edu/ 
^"l owe this example to Tom Wasow. 
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p], the word codana 'dog' is more frequent in Russian than even the words CHCueomnoe 
and seepb 'animal, beast', although there is no significant stylistical differences between 
them. To explain this, note that of all animals, only the dog and the horse are so 
privileged. A possible reason is that the connotation of animal in the common language 
includes not so much the opposition 'animal as non-plant' as the opposition 'animal 
as non-human'. But the dog and the horse are characteristically viewed as "almost- 
human" companions, and thus in a sense do not belong to animals at all, which is why 
the corresponding words do not have to be less frequent. 

The vocabulary of a natural language is structured so that there are words of dif- 
ferent specificity/generality. According to WordNet, a rose is a shrub is a plant is an 
organism is an object is an entity. There are at least two pretty obvious reasons for 
this. First, in some cases we need to refer to any object of a large class, as in take a 
seat, while in other cases we need a reference to a narrow class, as in you're sitting on a 
Chippendale. In the dialogue ([5|) two words, the generic one and the specific one, point 
to the same object. 

— I want some Tweakles! , . 

— Candy is bad for your teeth. 

Second, when context provides disambiguation, we tend to use generic words instead 
of specific ones. Thus, inhabitants of a large city environs say I'm going to the city 
and avoid naming it by name. Musicians playing winds call their instrument a horn, 
whether it's a trumpet or a tuba. Pet owners say feed the cat, although the cat has a 
name, and some of them perform a second generalization to feed the beast (also heard 
in Russian as naKopMU Mcueomnoe). In fact, the word candy in the Tweakles example 
fulfills both roles at once: it generalizes to all candies, because all of them are bad 
for your teeth, but also it refers to this specific candy by contextual disambiguation. 
We even use the ultimate generic placeholders like thingy when we dropped it and need 
somebody to pick it up for uJ^. A colorful feature of Russian vernacular is the common 
use of desemantized expletives as generic placeholders, where whole sentences complete 
with adjectives and verbs can be formed without a significant word. What may not 
be generally appreciated is that this strategy may, at least in some cases, turn out to 
be highly efficient. According to the author V. Konetsky pO], radio communications 
of Russian WWII fighter pilots in a dogfight environment, where a split-second delay 
can be fatal, consisted almost entirely of such pseudo-obscene placeholder words, as 
evidenced by recordings. It hardly could have been so, were it not efficient. 

The reason for this tendency to generalize is very probably the Zipfian minimization 
of effort for the speaker. A so-called word frequency effect is known in psycholinguistics, 
whereby the more frequent the word the more readily it is retrieved from memory (cf. 
|21| . |22|). However, contrary to Zipf, it doesn't seem plausible that such generalization 

^^As Ray Bradbury wrote in his 1943 story Doodad: "Therefore, we have the birth of incorrect semantic 
labels that can be used to describe anything from a hen's nest to a motor-beetle crankcase. A doohingey 
can be the name of a scrub mop or a toupee. It's a term used freely by everybody in a certain culture. 
A doohingey isn't just one thing. It's a thousand things." WordNet lists several English words under the 
definition "something whose name is either forgotten or not known". Interestingly, some of these words 
{gizmo, gadget, widget) developed a second sense, "a device that is very useful for a particular job", and one 
(gimmick) similarly came to also mean "any clever (deceptive) maneuver". 
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makes understanding more difficult for the listener. The whole idea of pitching the 
speaker against the listener in the effort minimization tug-of-war appears to fly in the 
face of communication as an essentially cooperative phenomenon, where a loss or gain 
of one party is a loss or gain of both. Again, we don't have hard data, but intuitively 
it seems that when there is only one city in the context of the conversation, it is even 
easier for the listener if it's referred to as the city rather than Moscow or New York. I'm 
going to the city means I'm going you know where while I'm going to London means I'm 
going to this one of a thousand places where I could possibly go. The first expression is 
easier not only for the speaker, but for the listener as well, because one doesn't have 
to pull out one's mental map of the world, as with the second expression. Or, put 
in information theoretic terms, the city carries much less information than Shanghai 
because the generic word implies a universal set consisting of one element, while the 
proper name implies a much larger universal set of dozens of toponyms, — but most of 
this extra information is junk and has to be filtered out by the listener, if Shanghai is 
in fact The City; and this filtering is a wasted effort. 

3 Zipf 's law and Zipfian coverings 

Organization of words over semantic space in such a way that each element is covered 
by a hierarchy of words with different extent of meaning makes a lot of sense. In this 
way, the speaker can select a word that refers to the desired element with the desired 
degree of precision. Or, rather, the most imprecise word that still allows disambiguation 
in the given context. The benefit here is that less precise words are more frequent, 
and thus more accessible for both the speaker and the listener, which can be said to 
minimize the effort for both. Another benefit is that such organization is conductive 
to building hierarchical classifications, which people are rather disposed to do (whether 
that's because world itself is hierarchically organized, is immaterial here). There are 
probably other benefits as well. 

Here is the simplest possible way to map words to semantic space in this hierarchical 
manner: let word number 1 cover the whole of S, words number 2 and 3 cover one-half 
of S each, words 4 through 7 cover one-quarter of S each, etc. (see Fig. [3|). It is easy 
to see that this immediately leads to Zipf 's distribution. Indeed, the extent of the k-th 
word is 

Under the assumption that the frequency of a word fk is proportional to the extent of 
its meaning //fc, this is equivalent to except for the piecewise-constant character of 
((H), see Fig. m What matters here is the overall trend, not the fine detail. 

Of course, real word meanings do not follow this neat, orderly model literally. But 
it gives us an idea of what Zipf 's distribution ([T]) can be good for. Consider a subset 
of all words whose frequency rank is in the range [k, kp] with some k and p > 1. Zipf's 
distribution has the following property: the sum of frequencies of words in any such 
subset depends only on the scaling exponent p (asymptotically with k — > oo), since by 
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Figure 3: Example of a hierarchical organization of semantic space. 




Figure 4: Frequency distribution for hierarchical model Fig. [3l 
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Riemann's formula, it is bounded by inequalities 




dx 



X 



n . 
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— = In 
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n — 1 



k-l 



(7) 



By our basic assumption, word frequency is proportional to the extent of its meaning. 
Thus, we can choose p so that the words in any subset [A;, kp\ together could cover 
the whole semantic space S without gaps and overlaps: the sum of their meanings' 
measures will be equal to the total measure of S. Of course, this does not guarantee 
that they do cover S in such a way, but only for Zipf's distribution such a possibility 
exists. 

Let us introduce some notation at this point, to avoid bulky descriptions. Let S be 
a measurable set with a finite measure ^. Define covering of S as an arbitrary sequence 
of subsets C = {mj},mi C S,^{mi) > fi{mi+i). Let the gap of C be the measure of 
the part of S not covered by C, 



and let overlap of C be the measure of the part covered by more than one mj. 



Finally, define {p, k)-layer of C as subsequence {mj},z G [k,kp] for any starting rank 
A; > and some scaling exponent p > 1. 

With these definitions, define Zipfian covering as an infinite covering such that for 
some p, both gap and overlap of {p, A;)-layers vanish as /c — > 00. This means that 
all words with ranks in any range [k, kp] cover the totality of S and do not overlap 
(asymptotically in fc ^ 00). Or, to look at it from a different point of view, each point 
in S is covered by a sequence of words with more and more precise (narrow, specific) 
meanings, with precision growing in geometric progression with exponent p. Again, 
this organization of semantic space would make a lot of sense, since it ensures the 
homogeneity of the "universal classification": precision of terms increases by a constant 
factor each time you descend to the next level. This is why the exponent = 1 in ([2]) 
is special: with other exponents one doesn't get the scale-free covering. 

The covering in Fig. [3] is an example of Zipfian covering, though a somewhat de- 
generate one. We will not discuss the existence of other Zipfian coverings in the strict 
mathematical sense, since the real language has only a finite number of words anyway, 
so the limit of an infinite word rank is unphysical. We need this as a strict definition of 
an idealized model which is presumably in an approximate correspondence with reality. 

Note though that since J2i^/j grows indefinitely as n — > 00, Zipf's law can be 
normalized only if cut off at some rank A^. The nature of this cut-off becomes very 
clear in the present model: the language does not need words with arbitrary narrow 
meanings, because such meanings are more efficiently represented by combinations of 
words. 

However, as noted above, demonstrating that Zipf's law satisfies some kind of op- 
timality condition alone is not sufficient. One needs to demonstrate the existence of a 



gap(C) = n{S) - f^{[Jmi) 
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overlap(C) = p{{x\x € more than one m> 
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plausible local dynamics that could be responsible for the evolution towards the optimal 
state. To this end, we now turn to the mechanisms and regularities of word meaning 
change. 

4 Zipfian coverings and avoidance of excessive syn- 
onymy 

Word meanings change as languages evolve. This is a rule, rather than an exception 
(see, e.g. [23], [2^; most of the examples below come from these two sources). There are 
various reasons for semantic change, among them need, other changes in the language, 
social factors, "bleaching" of old words, etc. Some regularities can be observed in 
the direction of the change. Thus, in many languages, words that denote grasping of 
physical objects with hands develop the secondary meaning of understanding, "grasping 
of ideas with mind": Eng. comprehend and grasp, Pr. comprendre, Rus. nonuMamb and 
cxeamueamb, Germ, fassen illustrate various stages of this development. Likewise, Eng. 
dear and Rus. mchuu, npospaHHUu illustrate the drift from optical properties to mental 
qualities. As a less spectacular, but ubiquitous example consider metonymic extension 
from action to its result, as in Eng. wiring and Rus. npoeodna (idem). There may also 
be deeper and more pervasive regularities |[25j. Paths from old to new meanings are 
usually classified in terms of metaphor, metonymy, specialization, ellipsis, etc. |26j . 

Polysemy, multiplicity of meanings, is pervasive in language: "cases of monosemy 
are not very typical" [24] ; "We know of no evidence that language evolution has made 
languages less ambiguous" [27] ; "word polysemy does not prevent people from under- 
standing each other" [24|- There is no clear-cut distinction between polysemy and 
homonymy, but since Zipf 's law deals with typographic words, we do not have to make 
this distinction. In the "meaning as mapping" paradigm, one can speak of different 
sense£l of a polysemous word as subsets of its entire meaning. Senses may be separate 
(cf. sweet: 'tasting like sugar' and 'amiable' El), they may overlap {ground: 'region, 
territory, country' and 'land, estate, possession'), or one may be a strict subset of the 
other {ball: 'any round or roundish body' and 'a spherical body used to play with'). 

Note that causes, regularity and paths of semantic change are not important for 
our purposes, since we are only concerned here with the extent, or scope, of meaning. 
And that can change by three more or less distinct processes: extension, formation, and 
disappearance of senses (although the distinction between extension and formation is 
as fuzzy as the distinction between polysemy and homonymy). 

Extension is illustrated by the history of Eng. bread which initially meant '(bread) 
crumb, morsel' ([23|, P- H), or Rus. naAe% 'finger, toe', initially 'thumb' ([21], P- 197- 
198). With extension, the scope of meaning increases. 

Formation of new senses may cause increase in meaning scope or no change, if the 
new sense is a strict subset of the existing ones. This often happens through ellipsis, 
such as with Eng. car, 'automobile' < motor car [23], p. 299 or parallel Rus. Mamuna 

^^This should not be confused with the dichotomy of sense and meaning. Here we use the word sense as 
in "the dictionary gave several senses of the word". 

^^Definitions here and below are from 1913 edition of Webster's dictionary. 
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< aemoMamuHa. In this case, the word initialy denotes a large class of objects, while 
a noun phrase or a compound with this word denotes a subclass. If the subclass 
is important enough, the specifier of the phrase can be dropped (via generalization 
discussed above), and this elliptic usage is reinterpreted as a new, specialized meaning. 

Meanings can decrease in scope as a result of a sense dropping out of use. Consider 
Eng. loaf < OE hlaf, 'bread'. Schematically one can say that the broad sense 'bread 
in all its forms' disappears, while the more special sense 'a lump of bread as it comes 
from the oven' persists. Likewise, Fr. chef, initially 'head as part of body', must have 
first acquired the new sense 'chief, senior' by metaphor, and only then lost the original 
meaning. 

In the mapping paradigm, fading of archaic words can also be interpreted as nar- 
rowing of meaning. Consider Rus. nepcm, 'finger (arch., poet.)'. The reference domain 
of this word is almost the same as that of naAev, 'finger (neut.)' (excluding the sense 
'toe'), but its use is severely limited because of a strong flavor. Thus, meaning scope is 
reduced here along the connotation dimension. But since we consider both denotation 
and connotation as constituents of meaning, narrowing of either amounts to narrowing 
of meaning. Both types of narrowing are similar in that they tend to preserve stable 
compounds, like meatloaf or oduH, kuk nepcm 'lone as a finger'. 

There is no symmetry between broadening and narrowing of meaning. Develop- 
ment of new senses naturally happens all the time without our really noticing it. But 
narrowing is typically a result of competition between words (except for the relatively 
rare cases where a word drops out of use because the object it denoted disappears). 
Whatever driving forces there were, but hlaf lost its generic sense only because it was 
supplanted by the expanding bread, chef was replaced by the expressive tete < testa, 
'crock, pot', and nepcm by najiev^ (possibly, also as an expressive replacement). 

This is summarized by Hock and Joseph [23] (p. 236): 

[...] complete synonymy — where two phonetically distinct words would 
express exactly the same range of meanings — is highly disfavored. [...] 
where other types of linguistic change could give rise to complete synonymy, 
we see that languages — or more accurately, their speakers — time and 
again seek ways to remedy the situation by differentiating the two words 
semantically. 

And by Maslov [24j, p. 201: 

[...] since lexical units of the language are in systemic relationships with each 
other via semantic fields, synonymic sets, and antonymic pairs, it is natural 
that changes in one element of a microsystem entails changes in other related 
elements. 

One important feature of this process of avoiding excessive synonymy is that words 
compete only if their meanings are similar in scope. That is, a word whose meaning 
overlaps with that of a significantly more general word, will not feel the pressure of 
competition. As discussed earlier, the language needs (or rather its speakers need) 
words of different scope of meaning, so both the more general and the more specific 
words retain relevance. This is in a way similar to the effect reported by Wasow et 
al [27] where it was found (both by genetic simulation and by studying polysemous 
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word use in Brown Corpus) that polysemy persists if one of the senses is significantly 
more common than the other. Despite the fact that this result is related to polysemy 
rather than to synonymy, it also can be interpreted as an evidence that meanings do 
not interact (compete) if they are sufficiently different in scope, whether they belong 
to the same word (polysemy) or to different words (synonymy). 

Summarizing the above, one can say that meanings tend to increase in scope, unless 
they collide with other meanings of a similar scope, while meanings of significantly 
different scope do not interact. But this looks just like a recipe for the development of 
approximately Zipfian coverings discussed in the previous section! Indeed, this kind of 
evolution could lead to semantic space being covered almost without gaps and overlaps 
by each subset of all words of approximately the same scope. In order to substantiate 
this idea two numerical models were developed. 

5 Numerical models 

The models simulate the two basic processes by which word meanings change in extent: 
generalization and specialization. They are very schematic and are not intended to be 
realistic. We model the semantic space by the interval S = [0, 1] and word meanings 
by sub-intervals on it. The evolution of the sub-intervals is governed by the following 
algorithms. 

Generalization model 

1. Start with a number N of zero-length intervals C 5 randomly distributed on S. 

2. At each step, grow each interval symmetrically by a small length 6, if it is not 
frozen (see below). 

3. If two unfrozen intervals intersect, freeze one of them (the one to freeze is selected 
randomly) . 

4. Go to step 2 if there is more than one unfrozen interval left, otherwise stop. 

Informally, words in the generalization model have a natural tendency to extend 
their meanings, unless this would cause excessive synonymy. If two expanding words 
collide, one of them stops growing. The other one can eventually encompass it com- 
pletely, but that is not considered to be "excessive synonymy", since by that time, the 
growing word is significantly more generic, and words of different generality do not 
compete. 

Specialization model 

1. Start with a number N of intervals, whose centers are randomly distributed on S 
and lengths are uniformly distributed on [0, 1] . 

2. For each pair of intervals r^, rj, if they intersect and their lengths k, Ij satisfy 
1/7 < li/lj < 7, decrease the smaller interval by the length of their intersection. 

3. Continue until there is nothing left to change. 
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The specialization model simulates avoidance of excessive synonymy where syn- 
onyms compete and one supplants the other in their common area. Parameter 7 deter- 
mines by how much the two words can differ in extent and still compete. 

Both these models reliably generate interval sets with sizes distributed by Zipf's 
law with exponent B = 1. The generalization model is parameter-free (except for 
the number of intervals, which is not essential as long as it is large enough). The 
specialization model is surprisingly robust with respect to its only parameter 7: we ran 
it with 7 G [1.1, 10] with the same result — see Fig. [5l It is interesting to note that 
with 7 = 1.1, specialization model even reproduces the low-rank behavior of the actual 
rank distributions, but it is not clear whether this is a mere coincidence or something 
deeper. 




Both models also generate interval sizes that approximately satisfy the definition of 
Zipfian covering. That is, if we consider the subset of all intervals between ranks of k 
and pk, they should cover the whole [0, 1] interval with no gap and overlap — for some 
fixed p and asymptotically in /c ^ 00. Fig. [6] shows the gap, i.e. the total measure 
of that part of S not covered by these intervals, as a function of the starting rank k. 
Scaling parameter p was chosen so that the sum of interval lengths between ranks k 
and kp was approximately equal to 1. The fact that the gap indeed becomes very small 
demonstrates that the covering is approximately Zipfian. This effect does not follow 
from the Zipf's law alone, because it depends not only on the size distribution, but also 
on where the intervals are located on S. On the other hand, Zipf's distribution does 
follow from the Zipfianness of the covering. 

Of course, these models provide but an extremely crude simulation of the linguis- 
tic processes. However the robustness of the result suggests that quite possibly they 
represent a much larger class of processes that can lead to Zipfian coverings and hence 
Zipf's distributions under the same very basic assumptions. 
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Figure 6: The gap of (fc, p)-layer decreases with increasing k. 
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6 Discussion 



To summarize, we propose the following. 

1. Word meanings have a tendency to broaden. 

2. On the other hand, there is a tendency to avoid excessive synonymy, which coun- 
teracts the broadening. 

3. Synonymy avoidance does not apply to any two words that differ significantly in 
the extent of their meanings. 

4. As a result of this, word meanings evolve in such a way as to develop a multi-layer 
covering of the semantic space, where each layer consists of words of approximately 
the same broadness of meaning, with minimal gap and overlap. 

5. We call arrangements of this sort Zipfian coverings. It is straighforward to show 
that they possess Zipf's distribution with exponent B = 1. 

6. Since word frequency is likely to be in a direct relationship with the broadness of 
its meaning, Zipf's distribution for one of them entails the same distribution for 
the other. 

This model is rooted in linguistic realities and demonstrates the evolutionary path 
for the language to develop Zipf's distribution of word frequencies. Not only it predicts 
the power law, but also explains the specific exponent B = 1. Even though we argue 
that Zipfian coverings are in some sense "optimal", we do not need this optimality to be 
the driving force, and can in fact do entirely away with this notion, because the local 
dynamics of meaning expansion and synonymy avoidance is sufficient. The "meaning" 
of Zipf's distribution becomes very clear in this proposal. 

The greatest weakness of the model is that it is based upon a rather vague theory 
of meaning. The assumption of proportionality of word frequency to the extent of its 
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meaning is natural (indeed, if one accepts the view that "meaning is usage", it becomes 
outright tautological), but it is unverifyable as long as we have no independent way to 
measure both quantities or at least compare meaning extents of different words. On 
the other hand, comparison of meaning extent of the same word at different historical 
stages is a less ill-defined notion. See also Appendix B. Further studies are necessary 
to clarify this issue. As one possibility, a direct estimate of word meaning extent might 
be obtained on the basis of the Moscow semantic school's Meaning— Text Theory (e.g. 
[28] . ^29j). which provides a well- developed framework for describing meanings. 

The treatment in this work was restricted to the linguistic domain. However, as is 
well known, Zipf 's law is observed in many other domains. The mechanism of compet- 
itive growth proposed here could be applicable to some of them. Whenever one has 
entities that a) exhibit the tendency to grow, and b) compete only with like-sized enti- 
ties, the same mechanism will lead to Zipfian covering of the territory and consequently 
to Zipf 's distribution of sizes. 

Appendix A: Mandelbrot's model revisited 

Mandelbrot set up to demonstrate that Zipf's law could be derived from the assump- 
tion that the language is optimal in the sense that it minimizes the average ratio of 
production cost to information content. The cost of "producing" a word was chosen 
to be proportional to the number of letters in it, and information content was defined 
to be the Shannon's entropy. It is well known that the maximum entropy per letter 
is achieved by random sequences of letters, just because entropy is a measure of un- 
predictability, and random sequences are the most unpredictable. Thus, under these 
assumptions the optimal language is the one where each sequence of n letters is as 
frequent as any other. But we already know from the analysis of the random typing 
model that this does produce the Zipf's distribution. 

Mandelbrot understood well the relationship between his optimality model and ran- 
dom typing model and remarked in [5] that "these variants are fully equivalent mathe- 
matically, but they appeal to such different intuitions that the strongest critics of one 
may be the strongest partisans of another". However the optimality model provides a 
framework that can be extended beyond this equivalence. 

First of all, let us briefly reproduce the mathematical derivation of the Zipf's law 
from the optimality principle. Let k be the frequency rank of the word Wk, let its 
frequency (normalized so that the sum of all frequencies is unity) be pk, and the cost 
of producing word Wk be Ck- It makes sense to leave the function Ck unspecified for as 
long as possible. The word's information content, or entropy, is related to its frequency 
Pk as Hk = —log2Pk- The average cost per word is given by 

C = J2pkCk (10) 

k 

and the average entropy per word by 

H = -^Pklog2Pk- (11) 
k 
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One can now ask what frequency distribution {pk} satisfying J2kPk = 1 will minimize 
the cost ratio C* = C/H. 

We can use the standard method of Lagrange multipliers to find the minimum of 
C* , given the normalization constraint on pk'- 

A(C- + Ap,)=0 (12) 

Here the value of Langrange multiplier A is to be determined later so as to normalize 
the frequencies. Performing the differentiation in (I12p . we obtain 



^ + -^(log2Pfc + l)-A = 0,Vfe (13) 
This expresses the frequencies pk given costs C^: 

Pk = X'2~^^^/^, (14) 

where we denoted 

X' = 2^"'/^''\ (15) 

Thus, A' is an arbitrary constant that we can use directly to normalize frequencies. 
Now, once the cost Ck of each word is known or assumed, eq. (fHl) yields the frequency 
distribution for the words. Note though that to obtain a closed-form solution, one also 
needs to consistently determine the constants C and H in the RHS of (fHl) from their 
respective definitions (fTO]l and (fTTj) . 

Now, it is easy to see from eq. (fHl) that a power law for frequencies could only result 
from the ansatz 

Cfc = Colog2A; (16) 

which leads to 

p, = X'k-'',B = H^ (17) 

(note that C oc Co, so Cq/C doesn't depend on Co). How could one justify eq. (fTBll ? 
In Mandelbrot's original formulation, as we already mentioned, the cost of a word was 
assumed to be proportional to its length, and then the only way to get the logarith- 
mic dependency on the rank, is to assume that the number of distinct words grows 
exponentially with length. It is not necessary in this formulation to postulate that 
any combination of letters of a given length is equally probable, but even this weaker 
requirement is not realistic for natural languages, as demonstrated by Fig. [2l 

There is however a much more plausible argument in favor of the desired ansatz (fT6]) , 
which does not depend on any assumptions about word length at all. Suppose words 
are stored in some kind of an addressable memory. For simplicity, one can imagine a 
linear array of memory cells, each containing one word. Then, the cost of retrieving the 
word in the fc-th cell can be assumed to be proportional to the length of its address, 
that is to the minimum number of bits (or neuron firings, say) needed to specify the 
address. And this is precisely log2 k. Of course, this doesn't depend on memory being 
in any real sense "linear". 
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It's important to note that this is not just a different justification, because with it 
the optimahty model is no longer equivalent to the random typing model. Let us now 
proceed to solving (flTl) . Prom the normalization condition for frequencies, we get 

K = l^f (18) 

where C is the Riemann zeta-function C,{s) = But this is not the end of the 

story, since B is related to H and C via eq. (fTTl) . and they in turn depend on B via pk- 
This amounts to an equation for the power law exponent B, which thus is not arbitrary. 
By substituting (fTSjl back into (fTOl) and (fTTl) . we get 



^ oo 

C = ^E^'^'logsfc (19) 



1 

oo 



H = -^^Y.^-'^^og^iKiB)-'"') (20) 

It is now easy to see that B = HCq/C can only be satisfied when C{B) = 1, which 
implies B ^ oo. This is not a very encouraging result, since it means that the minimum 
cost per unit information is achieved when there's only one word in use, and both cost 
and information vanish. 

This conclusion is borne out by a simple numerical simulation. Recall that in 
Section 2, we noted that cost ratio optimization can be achieved via local dynam- 
ics. Namely, if speakers notice that a word's individual information/cost ratio is below 
average, they start using it less, and conversly, if the ratio is favorable, the word's 
frequency increases. It is hard to tell a priori whether this process would converge 
to a stationary distribution, so numerical simulation was performed. The following 
algorithm implements this dynamics: 



Cost ratio optimization algorithm 

1. Initialize an array of N frequencies pk with random numbers and normalize them. 

2. Calculate average cost and information per word according to ([TO]) . (fTTl) . 

3. For each A; = 1, . . . , A^, calculate cost ratio for the A:-th word as = Ck/H^ = 
log2 A;/ log2 Pfc ■ If it is within the interval [(1 — 7)C*,(1 + 7)C*], where 7 is a 
parameter, leave pk unchanged. Otherwise increase pk by a constant factor if 
cost ratio is above the average or decrease it by the same factor if it is below the 
average. 

4. If no frequencies were changed, stop. 

5. Reorder words (i.e. reassign ranks in the decreasing order of frequency), renor- 
malize frequencies and repeat from step 2. 

This procedure quickly leads to the state where all frequencies but one are zero. 

So the ansatz (fTBIl does not eventually lead to the desired result. It is probably this 
problem that prompted Mandelbrot to propose a modification to the Zipf 's law. In his 
own words ([5], p. 356), 
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...it seems worth pointing out that it has not been obtained by "mere curve 
fitting": in attempting to explain the first approximation law, i{r,k) = 
(l/10)fer~^, I invariably obtained the more general second approximation, 
and only later did I realize that this more general formula was necessary and 
basically sufficient to fit the empirical data. 

It turns out that the degeneracy problem can be avoided by the following modification 
of the cost function ansatz: 

Cfc = C7olog2(A; + A:o) (21) 

It looks rather naturally if we again imagine the linear memory, but this time with 
first ko cells not occupied by useful words. Substitution of ([2T]) into (fT4l) yields Zipf- 
Mandelbrot law 

where C is now the Hurwitz zeta function, ({s, q) = J2o^ + q)~'^- 

Zipf-Mandelbrot formula has the potential of correctly approximating not only the 
power law, but also the initial, low-rank range of the real frequency distributions, 
which flatten out at fc < 10 or so. But remember again that the second part of pTll . 
B = HCq/C, needs to be satisfied, which means that parameters ko and B are not 
independent. This is rarely, if ever, mentioned in the literature, while it is a rather 
important constraint. Substituting (f22l) into (fTOl) and (fTT]l and noting that 



C{s,q) = -Y,{n + q)-nn{n + q) (23) 



ds 



we obtain 



^ Co CiB,l + ko) 

ln2C{B,l + ko) ^ ' 

B = HCo/C (26) 

where C is the derivative over the first argument. After simple transformations this 
reduces to 

lnC(i?,l + A:o) 

that is 

aB,l + ko) = l (28) 

When ko ^ 0, B ^ oo, as previously. In the oppposite limit, /cq oo, the Zipfian 
exponent B tends to 1, but extremely slowly. To see this, let fco be a large integer. 
Then, 

fco 

aB,l + ko) = C{B)-J2n-'' (29) 

1 
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In order to compensate for the infinite growth of the second term as ko ^ oo, B must 
tend to 1, where Riemann's zeta function has a pole. Let i? = l + e, e<Cl, then 



fco 
1 



B 



0(1/6) 
O ( -fcn 



(30) 
(31) 



whence ^ = 0(1), or B = 1 + 0(1/ In fco). 

The relationship between B and kg can be calculated numerically, but this would 
not tell us whether the resulting solution is stable with respect to the local dynamics 
described above. Running the local dynamics model shows that, in contrast to the case 
ko = 0, the model does converge to a stable solution described by (f22l) . as shown in 
Fig. 7. 



Figure 7: Zipf-Mandelbrot law with different values of /cq. Real frequency distribution (not 
to scale) and Zipf's law are shown for comparison. 
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However, as is readily seen from the figure, no values of k^ yield a satisfactory 
approximation to the actual distribution. For small ko, the slope is still significantly 
steeper than —1, but for larger fco, the flattened portion spreads too far. Thus, with 
fco = 10, the slope is still about —1.4, but the power law starts at about fc = 100, while 
in the actual distribution it begins after fc = 10. 

To sum up, Zipf-Mandelbrot law can be obtained from a model optimizing the 
information/cost ratio with no assumptions about word lengths. This model is not 
equivalent to the random typing model, and allows the optimum to be achieved via local 
dynamics, i.e. in a causal, rather than teleological manner. However, the distributions 
obtained in this way do not provide a reasonable fit to the actual distributions. In 
addition, the local dynamics is not convincingly realistic, as pointed out in Section 2. 



24 



Appendix B: Meaning and frequency 



In this Appendix we'll consider some evidence in favor of the hypothesis that word 
frequency is proportional to the extent of its meaning. Far from being a systematic 
study, this is rather a methodological sketch. This study was done in Russian, the 
author's native language. In the English text we'll attempt to provide translations 
and/or equivalents wherever possible. 

Strictly speaking, one could prove the hypothesis only if an explicit measure of 
meaning extent is proposed. However the frequency hypothesis allows to make some 
verifiable predictions. Suppose that some "head" word wq has a set of partial synonyms 
and/or hyponyms ("specific" words) {wq, . . . ,Wq}, whose meanings together cover the 
meaning of wq without gaps and overlaps. Then, by definition, their total meaning 
extent is equal to that of wq. In that case, the frequency hypothesis predicts that the 
sum total of hyponym frequencies should be close to the frequency of the head word. 

There's hardly very many such examples in the real language. First, pure hyponyms 
are not very common; it is more common for words to have intersecting meanings, such 
as with HAOxou, 'bad, poor', and xydou, 'skinny; torn, leaky; bad, poor'. Second, only 
in rare cases one can state confidently that the hyponyms cover the whole meaning of 
the head word. For example, in the domain of fine arts, HammpMopm 'still life', neuaaofc 
'landscape', and nopmpem 'portrait' are pure hyponyms of the word Kapmuna 'picture', 
but there exist other genres of painting that can't be accounted for with frequency 
dictionary, since their names are phrases, rather than single words {cHcaHpoeaM cuena 
'genre painting', 6amajibHoe noAomno 'battle-piece'). 

Nevertheless, examples of this type do exist. Table [6] contains frequencies of the 
head word depeeo, depeeu^o 'tree; also dimin.' and of the specific tree names found in 
the frequency dictionary [2]. We omitted words denoting primarily the fruit or bloom of 
the corresponding tree, such as spyma 'pear', euuiHM 'sour cherry', pM6uHa 'rowan' hjih 
MazHOAUH 'magnolia'. To count them correctly, one would have to know the fraction of 
word instances denoting the tree specifically, and we don't have this data. 

From the table one can see that the sum of frequencies of specific tree names is very 
close to the frequency of the head word (we'll consider the "physicist's error margin" 
of 20% to be acceptable). Possibly, the word naAbMa 'palm tree' could be removed 
from the list: it is not clear why it turned out to be the sixth frequent tree in Russian- 
language texts before Auna 'linden' u m6aohm 'apple tree'. However, small changes in 
the list will not conceptually affect the result. 

This is just one example of many. Table [2] contains the frequencies of common fiower 
names. They also sum up very close to the frequency of the word it,eemoK (it^eemoHeK) 
'flower; also dimin.'. (The word koaokoawuk 'small bell; bluebell', frequency 11.08, is 
omitted here, since primarily it denotes a bell, and not a flower.) Possibly, subtracting 
the frequencies of figurative meanings of words like poaa 'rose', would still improve the 
result. 

Names of berries also follow this pattern, see table [3l (Here and below, we list in 
the table captions some words not found in the dictionary, apparently because their 
frequency is less than one per million.) The difference is somewhat greater in this case, 
but we should take into account that MaAuna and KAtoKea possess active figurative 
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Table 1: Tree. 



word 


freq./mln 


word 


freq./mln 


flepeBO 'tree' 


224.52 


cocHa 'pine' 


38.07 


flepeBLio tree dimm. 


8.08 


flyo oak 


27.24 






MKa nr 


26.57 






6epe3a 'birch' 








Tonojib 'poplar' 








iiajiDjvia paiiii iiee 








jiHna 'linden' 


13.89 






fl6jiOHH 'apple tree' 


13.41 






HBa 'willow' 


7.96 






Keflp 'cedar' 


7.77 






Kjien 'maple' 


7.53 






ocHiia 'aspen' 


6.79 






jiHCTBeHHHu,a 'larch' 


6.00 






ejib 'fir' 


4.84 






opemiiHK 'filbert' 


4.84 






Bfl3 'elm' 


3.31 






nnxTa 'fir' 


3.24 






KHuapHC 'cypress' 


3.18 






SBKajinnT 'eucalyptus' 


2.51 






ojibxa 'alder' 


1.96 






acenb 'ash' 


1.90 






BCTjia 'willow' 


1.84 






6yK 'beech' 


1.78 






ojiaTan 'platan' 


1.71 


sum 


232.60 


sum 


246.82 
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Table 2: Flower. 



word 


freQ./ mln 


word 


freQ. /mln 


iIBeTOK 'flower' 


id4.80 


po3a 'rose' 


41. OU 


iIBeTOHeK (dimin.) 




MaK 'poppy' 


z < .yi 






Tiojibnaii 'tulip' 


1 

Iz 






U/Xy adri 4MK QcLllQciilOIl 


1 1 .oz 






CHpGHb 'lilac' 


y .az 






T^/^T\ ^ O TTTT.^0 '^OICTt' 

puMaiiiKa uaiby 


o.oo 






jiHJiHfl 'lily' 


7.65 






rB03flHKa 'carnation' 


7.35 






nori^co.Tinyx 'sunflower' 


5.02 






Hepenyxa 'bird cherry' 


4.84 






jiiOTHK 'buttercup' 


4.10 






4)HajiKa 'violet' 


4.22 






BacHJieK 'cornflower' 


3.61 






jiaHflbim 'lily of the valley' 


2.94 






xpusaHTeMa 'chrysanthemum' 


2.82 






KpoKyc 'crocus' 


2.26 






Hapu;HCC 'daffodil' 


2.20 






repaub 'geranium' 


2.02 






acTpa 'aster' 


1.90 






noflCHexHHK 'snowdrop' 


1.78 






He3a6yflKa 'forget-me-not' 


1.65 






rjiaflHOJiyc 'gladiolus' 


1.29 






opxHflea 'orchid' 


1.29 






nHOH 'peony' 


1.22 


sum 


146.72 


sum 


169.44 
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and idiomatic meanings in Russian (resp., 'a criminal flat' and an approximate equiv- 
alent of 'red herring'). Besides, it is not quite clear whether the cherries eumnsi and 
uepemHn truly belong in this list: first, a considerable number of instances will refer to 
corresponding trees, not fruits, and second, we are not certain whether the designation 
Hzoda 'berry' is appropriate for them. For instance, in the classical Dahl's dictionary, 
the entry for cherry starts with "A tree and its frut...", while the entry for cranberry or 
raspberry starts with "A bush and its berry...". Of course, for the purposes of this work, 
it is a matter of lexicography, rather than botany. 

Table 3: Berry. Not in dictionary: gooseberry, cloudberry, andbilberry. 



word 


freq./mln 


word 


freq./mln 


Hrofla 'berry' 


25.83 


MajiHHa 'raspberry' 


7.59 


HroflKa (dimin.) 


3.00 


BHinna 'sour cherry' 


6.98 






seMjiHHHKa 'wild strawberry' 


5.69 






pH6HHa 'rowan berry' 


3.86 






CMopoflHHa 'currant' 


3.98 






Kjiy6HHKa 'strawberry' 


3.12 






KjiiOKBa 'cranberry' 


2.94 






6pycHHKa 'Ungonberry' 


2.82 






^epuHKa 'blueberry' 


2.69 






ejKeBHKa 'blackberry' 


2.08 






^epeniHfl 'cherry' 


1.47 


sum 


28.83 


sum 

without cherries 


43.22 
34.77 



In all the three examples, we didn't have to face the question of how to prove that 
the hyponyms indeed cover the head word's meaning without overlaps (an object can't 
be both a gooseberry and a blueberry) and gaps (each berry has a specific name). 
However, some subtleties can already be found here. Thus, if "b copoK uhtb 6a6a 
HroflKa OHHTE." (a proverb; lit.: "at 45 a woman is a berry again") this "berry" is none 
of the berries we listed. On the other hand, eopoecKan Majiuna ('a criminal fiat'; lit.: 
"thieves' raspberry") is not a berry. In this particular case, there is no doubt that such 
non-literal usage will not appreciably affect the results; what's more important, it is 
possible, at least in principle, to account for it by studying texts. Below we'll encounter 
much greater difficulties, which require systematic and more formal approaches. 

A somewhat different example is given in table IH containing a classification of meat 
produce, which is pretty chaotic from a logician's point of view, but quite common 
in everyday use. We'll note that although a sausage can contain beef or pork, the 
meanings of words K0Ji6aca 'sausage' and zoenduHa 'beef do not intersect (or intersect 
negligibly). The same can be said about other word pairs in the table. For the non- 
Russian reader, it should be noted that msico does not have many extended meanings 
of English meat, and means practcally nothing beyond 'the fiesh of animals used as 
food'. But are all the hyponym meanings really contained within the meaning of the 
word MHCO 'meat'? For instance, can we say that nammem 'pate' C mhco 'meat' (we 
will denote the relationships between meanings with mathematical symbols of subset, 
intersection, and union C,n,U)? The evidence in favor of this statement is provided 
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by locutions like Bo3bMU nammem, me6e nado ecmb 6ojibme mmcu ('Take some pate, 
you need meat to recover'). 

Table 4: Meat. Not in dictionary: poMuimeKC 'rump steak', uiHuv^eAb 'schnitzel'. 



word 


freq./mln 


word 


freq./mln 


MHCO 'mG&t' 


84.47 


kojt6ciCcl 'scLUsage, bologna' 


39.48 






KOTjieTa 'cutlet' 


11.81 






cocHCKa 'sausage' 


9.12 






BBT^HHa 'ham' 


6.49 






6apaHHHa '(meat of) lamb' 


5.88 






CBHHHHa 'pork' 


5.82 






6H4)mTeKC 'steak' 


4.96 






roBaflHua 'beef 


4.22 






(J)apm 'ground meat' 


3.12 






naniTeT 'pate' 


3.06 






TejiHTHua 'veal' 


2.57 






capflejibKa 'wiener' 


1.78 






0T6HBHafl 'chop' 


1.47 






KOTjiCTKa 'cutlet (dimin.)' 


1.22 


sum 


84.47 


sum 


101.00 



So far, we only considered head words from a mid-frequency range (the most fre- 
quent, depeeo 'tree' has a rank of 435). But the supporting data can be found among 
high-frequency words as well. Table [5] classifies humans by age and gender (the rank 
of the word uejioeeK 'human, person' is 33; it is counted together with its plural form, 
Awdu). As an aside, we note the curious fact that the most frequent words for male 
and female persons come in exactly opposite order in terms of age: in the order of 
decreasing frequency we have cmapuK 'old man', MOAmuK 'boy', napeub 'lad, guy', 
MyDfCHUHa 'man', but McenmuHa 'woman', deeymna 'young woman', deeoHKa 'young 
girl', cmapyxa 'old woman'. Also, the net frequency of all the male terms (1377) is 
practically the same as the net frequency of all the female terms (1339). Frequency is 
rather uniformly distributed over age groups as well. 

There are new difficulties in this case: obviously, there are significant intersections 
between the meanings of some hyponyms. This is mostly because 

MUJibHUK, deeoHKa 'boy, girl' C {peSeuoK 'child' U dumn 'child' U MJiadeneu^ 'baby') 

(a boy or a girl is almost necessarily a child or a baby) 0. Indeed, the net frequency 
of the words pe6eH0K, durriM, MJiadeuev, 'child, baby' is 637.7, and the net frequency 
of the words MajimuK, deeoHKa, MaAbHuiuKa, deenoHKa, naii,aH, napeneK, napHuiuKa, 
MttAbHOHOK 'boy, girl' is 702.94, which is pretty close. So we can subtract the net fre- 
quency of the neutral terms from the sum of frequencies, which makes the net frequency 
of the rest of hyponyms very close to the frequency of the head word nejioeen 'human'. 

^''Of course, there are exceptions here, too. Compare a quote from abovementioned Viktor Konetsky: A 
fiftyish grocery store saleswoman is universally called "deeyiuKa" (girl), even though she has five children. 
And I once heard older female road workers going for lunch say: "Let's go, girls!" Such a girl is not a child. 
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Table 5: Human. 



word 


freq./mln 


word 


freq./mln 


HejiOBeK 'human' 


2945.47 


pe6eHOK 'child' 


593.50 






>KeHLLi,HHa 'woman' 


584.32 






CTapHK 'old man' 


313.64 






MajibHHK 'boy' 


290.81 






flesymKa 'young woman' 


286.53 






napciib 'lad, guy' 


258.74 






MyjKHHHa 'man' 


252.98 






flesoHKa 'young girl' 


191.04 






CTapyxa 'old woman' 


105.89 






MajibHHmKa 'boy (derog.)' 


92.55 






fl,emm,a. 'girl; virgin' 


59.86 






^CBHOHKa 'young girl (derog.)' 


58.95 






KJHoma 'young man' 


58.09 






CTapymKa 'old woman (dimin.)' 


52.21 






CTapHHOK 'old man (dimin.)' 


40.95 






nau,aH 'boy (dial., colloq.)' 


24.91 






MJiaflenen; 'baby' 


27.18 






napeneK 'boy, dimin. of lad" 


21.73 






napHHmKa 'boy, dimin. of lad! 


19.95 






flHTfl 'child' 


17.02 






MajibHOHOK 'boy (dimin.)' 


3.00 


sum 


2945.47 


sum 


3353.85 






without neut. terms 


2716.15 
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The frequency hypothesis works with words of relatively low frequency as well: see 
tables [6] (j>w6a 'fish') and [7] (aa^jop 'fence'). 

Table 6: Fish. Not in dictionary: KpacHonepKa 'rudd', caAana 'sprat', najimyc 'halibut', 
cmaepuda 'scad', HomameHUM, myneu, 'tuna', KecfiaAb 'mullet', uaAUM 'burbot', riAomea 
'roach', ceeptosa 'sturgeon', necKapb 'gudgeon', Mypena 'moray', OMyAh 'omul'. 



word 


freq./mln 


word 


freq./mln 


pbi6a 'fish' 


120.03 


casan 'sazan' 


16.47 


pbi6Ka (dimin.) 


20.02 


Kapacb 'crucian' 


14.63 






axyjia 'shark' 


10.77 






cejie^Ka 'herring' 


9.61 






Kapn 'carp' 


9.24 






myxa 'pike' 


9.06 






COM 'catfish' 


8.20 






CKaT 'ray' 


6.98 






cy/iaK 'pike perch' 


6.06 






Jiem 'bream' 


5.51 






(Jjopejib 'trout' 


4.53 






OKyHb 'perch' 


4.41 






Bo6jia 'vobla' 


2.94 






KaMoajia nounder 


2.88 






yropb 'eel' 


2.82 






Jiococb 'salmon' 


2.57 






TpecKa 'cod' 


2.14 






cejib^Ib 'herring' 


2.08 






xeK 'hake' 


2.02 






cenra 'salmon' 


1.78 






oceTp 'sturgeon' 


1.59 






epm 'ruff' 


1.59 






capflHHa 'sardine' 


1.53 






CTepjiflflb 'sterlet' 


1.47 






CKyM6pHH 'mackerel' 


1.22 






6ejiyra 'beluga' 


1.10 






rop6yma 'salmon' 


1.10 


sum 


140.05 


sum 


134.43 



Let us now consider other parts of speech. Two simple examples with adjectives 
can be found in tables [8] (cmapuu 'old') and [9] {npacHuu 'red'). A more complicated 
example is given by the word 6oAbmou 'big, large' shown in table [lOl The net frequency 
of hyponyms significantly (by a quarter) exceeds the frequency of the head word. This is 
as expected, since some of the hyponyms' meanings definitely intersect: thus, ospoMHUU 
and zpoMadnuu are as close to exact synonyms as it gets (cf. Eng. huge and enormous). 
However there's a possibility for a deeper and more interesting analysis here. 

Consider locutions [3214411 

Is this a raspberry or a strawberry? (32) 
*Is this a strawberry or a berry? (33) 
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Table 7: Fence. Not in dictionary: na/iucad. 



word 


freq./mln 


word 


freq./mln 


3a6op 'fence' 


66.72 


orpa;i;a 'fence' 


25.83 






Hsropoflb 'fence, hedge' 


10.59 






njiBTenb 'wicker fence' 


9.61 






^acTOKOJi 'stake fence' 


5.39 






niTaKeTHHK 'picket fence' 


2.57 






3aropo/;Ka 'fence' 


2.20 






TMii 'paling' 


1.96 


sum 


66.72 


sum 


58.15 



Table 8: Old. Not in dictionary: 3aK0CHe,/iuu, 3aMamope,nuu, samacKaHHUu, sanepcmeeAuu, 
ucmacKaHHUu, nodepotcaHHUu, hoauhhauu, nocede/iuu, nompenaHHUu, cmapo6umHuu. 



word 


freq./mln 


word 


freq./mln 


CTapbiii 'old' 


528.25 


flpeBHHH 'ancient' 


75.60 






noxHJiOH 'elderly' 


63.17 






ceflOH 'grey-haired' 


62.99 






CTapHHHbiii 'antique' 


53.07 






flasHHii 'bygone' 


34.71 






6opoflaTbra 'bearded; old (of jokes)' 


18.67 






neMOJio^oii 'not young' 


16.34 






MHorojieTHHii 'longstanding' 


11.51 






CTapoMO ri^HbiH ' old-fasliioncd ' 


11.51 






npecTapejibiii 'very old (of people)' 


10.04 






BCTXHii 'shabby, decrepit' 


9.67 






BCKOBoii 'age-old' 


6.86 






H3BeHHbiH 'primeval' 


6.67 






OTCTajibifl 'outdated, retrograde' 


5.94 






flpaxjibiii 'decrepit' 


5.82 






ycTapejibiii 'outmoded, outdated' 


5.39 






HCKonaeMbifl 'fossilized' 


5.20 






nonomcnnbiH 'worn, shabby' 


4.77 






AonoTonHbiH 'antediluvian' 


4.16 






flaBHHmHHH 'bygone' 


3.55 






aacTapejTbiH 'inveterate' 


3.37 






MHoroBCKOBoii 'centuries-old' 


3.37 






HCKOHHbifl 'original' 


3.06 






3acKopy3.TibiH 'calloused, backward' 


2.69 






3aKopeHejibiii 'inveterate, ingrained' 


1.96 






HCTepTbiii 'worn' 


1.71 






OTJKHBniHii 'obsolete' 


1.65 






apxaHHecKHii 'archaic' 


1.35 






CTapo^aBKHii 'ancient' 


1.35 






o6BCTma,TibiH 'shabby, decrepit' 


1.29 






apxaH^Hbiii 'archaic' 


1.04 


sum 


528.25 


sum 


438.48 
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Table 9: Red. Not in dictionary: KapMUHHUu, pdsiHuu, HepeAenuu. 



word 


freq./mln 


word 


freq./mln 


KpacHbiii 'red' 


316.64 


pbiJKHH 'red-haired; rust-colored' 


89.8 






posoBBiH 'rosy, pink' 


77.98 






ajibiii 'scarlet' 


32.99 






KpoBaBbiii 'bloody' 


32.93 






SarpoBbiii 'crimson' 


22.16 






pyMaHbiH 'ruddy' 


17.2 






MajiHHOBbiii 'crimson' 


14.02 






nyHu,OBbra 'crimson' 


3.55 






6op^OBbra 'vinous' 


2.82 






6arpHHbra 'crimson (arch., poet.)' 


2.63 






KopajiJiOBbiii 'coral' 


2.57 






MopKOBHbiii 'carrot (adj.)' 


2.57 






py6HH0Bbra 'ruby (adj.)' 


2.2 






nypnypHbiii 'purple' 


1.84 






CBeKOJiBHbiii 'beet (adj.)' 


1.04 


sum 


316.64 


sum 


306.3 



Is this a boy or a girl? (34) 

Is this a boy or a man? (35) 

*Is this a boy or a child? (36) 

*Is this a boy or a person? (37) 

Do you want pork or pate? (38) 

(?)KynHTb CBHHHHy HjiH MHCO? '~Do you want pork or meat?' (39) 

(?)KynHTB naniTeT hjih mhco? '~Do you want pate or meat?' (40) 

*KynHTb roBHflHHy hjih mhco? '~Do you want beef or meat?' (41) 



Everything is clear with items [32H381 non-intersecting specific words can occur in al- 
ternative constructions with each other, but not with the head words. Locutions [391 HOl 
are possible only if mmco 'meat' is used in constrained, specialized (sub) meanings, exist- 
ing in the vernacular: [mhco 'meat')^ = zoenduHa 'beef, {mhco 'meat')^ = cupoe mmco 'raw meat' 
(this is proved by the fact that|4T]is not possible). These example, therefore, also in- 
volve non-intersecting (non-overlapping) meanings. As a first approximation, we will 
consider this as a criterion of meaning overlap: if two words can participate in an 
alternative construction of this type, their meanings do not overlap. 

To apply this criterion to hyponyms of the word 6oAbmou 'big, large', consider 
examples l42l [43l Although the semantic difference between them is intuitively obvious, 
it is not easy to explicate it. There are objects that are both long and wide, as well 
as objects that are both long ang huge, — and still the first example is perfectly valid, 
while the second one is impossible. But keeping with the methodological principles 
of this work, we will not attempt to formulate the difference in semantic terms. On 
the contrary, we take acceptability of a locution as a linguistic datum, and on this 
basis draw conclusions about word semantics. That is, we will define two words non- 
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overlapping in their meanings, if they can can participate in an alternative construction 
of the typeElHlSl 



Is it wide or long? 
*Is it long or huge? 



(42) 
(43) 



Now, accepting the above criterion for non-overlapping meanings, we can select a 
subset of hyponyms from table [JOl which do not overlap and mean roughly 'big/large 
in a certain dimension or trait'. Almost all remaining hyponyms are in fact emphatic or 
superlative terms: 'very big/large, regardless of dimension or trait' (only two, ueMajiuu 
'not small' and uapsidnuu 'fairly large', are hard to classify). It is easy to make sure that 
the first group consists of virtually non-overlapping adjectives. Admittedly, in the lower 
part of the table, the criterion becomes less clear-cut: thus, the question in example SH 
is somewhat awkward; however it is meaningful and understandable, in contrast to 1431 
Of course, there is still some overlap in the meanings; after all, we're dealing with a 
living language. But it is small enough so that any further corrections will not change 
the result in any significant way (and may still improve it). 



In the 5th column of table [10] we sum up the frequencies of the hyponyms that are 
specifying the trait or dimension. The net frequency is very close to the frequency of 
the head word. 

The word MajieubKuu 'small, little' (table [TTI) is very similar. However we face a 
new complication here: the main concept is expressed by three words, rather than 
one: MajieubKuu, He6oAbmou and, possibly, mujiuu. It is somewhat similar to the 
distinction between small and little in English. Consider the first two adjectives. 
Both are direct and stilistically neutral antonyms to 6ojibmou 'big, large'. However 
their meanings are distinct. For example, they are not interchangeable in the com- 
mon phrases like MajieubKuu MajibHUK 'little boy' and He6ojibmoe KOJiuuecmeo 'small 
amount': *He6ojibmou MajibHun and *MaAeHbKoe KOAunecmeo are not normative (while 
the adjective 6ojibmou 'big, large' can modify both nouns). But even when both ad- 
jectives are admissible, they mean different things. Thus, MOAeubnaM Mumna 's^a little 
mouse' means 'small compared to the speaker, as all mice', or, less probably, 'a young 
mouse', but He6oAbmasi Muvina 'ssa small mouse' means 'small compared to other mice, 
less than usual mouse size'. Even when this distinction is not applicable, there still can 
be a quantitative difference, as in example [45l 

— 3tot KycoK cjinniKOM 6ojiE.moH. 'This piece is too big.' 

— OTpesaTb Te6e He6ojibmoH hjih MajieubKHii? '~Do you want a smaller one or a small 



As a result, we consider the words MaAeubKuu h He6oAbtuou to have almost non- 
overlapping meanings. As for the adjective MaAuu, in its long form it is used only in 
compound toponyms and scientific nomenclature (cf. Lesser Antilles). But in its short 
form, it has a common and distinctive meaning of 'too small to fit', not covered by 



The king's palace is big! 
Is it spacious or grandiose? 



(44) 



(45) 



34 



Table 10: Big/large. 



word 


ireq./mm 


word 


ireq./mm 


trait 


emphasis 


6o.TibmoH 


1630.96 


BbicoKHH 'tall, high' 


310.34 


height 


- 


'big, large' 




orpoMHbiii 'huge' 


298.95 




+ 






BejiHKHH great (signmcantj 


C\ Al'7 C\f\ 

247.90 


signmcance 


+ 






fljiHHHbiii 'long (space)' 


244.05 


length 


- 






mnpoKHH 'wide' 


187.31 


width 


- 






TOJiCTbifl 'thick' 


176.12 


diameter; thickness 


- 






KpynHbiii 'large-scale, coarse' 


151.74 


all dimensions 


- 






rjiy6oKHH 'deep' 


135.58 


depth 


- 






flOJirHH long (timej 


132.52 


time 








3iiaHHTejTbiibiH signmcant 


60.17 


signmcance 








rHraHTCKHii 'giant' 


4z.z4 




+ 






rpoMa^Hbiii 'tremendous' 


40.77 




+ 






^.iHTC.ibiibiH 'prolonged' 


35.56 


time 








npocTopHbiii 'spacious' 


28.03 


space 








o6mHpHbra 'vast' 


26.20 


extent 








iieMajTbiH 'not small' 


22.83 










rpanflHOSHbiii 'grandiose' 


18.24 


impression, intent 


+ 






BHymHTejibHbiH 'impressive' 


13.34 


impression 








KOJioccajibiibiH 'colossal' 


9.79 




+ 






rpoM03flKHii 'bulky' 


9.73 


all dims.; maneuverability 








HspflflHbiii 'fairly large' 


8.75 










HcnojiHHCKHii 'gigantic' 


6.37 




+ 






MacmTa6HbiH 'large-scale' 


4.16 


intent; influence 








HenoMepHbifl 'exorbitant' 


3.98 




+ 






o6'beMiibiH 'bulky' 


3.43 


bulk, volume 








o6'beMHCTbra 'voluminous' 


3 


volume, bulk 








6ojibmyiri;HH 'big (superl.)' 


2.14 




+ 






npoTajKeHHbiii 'lengthy' 


1.47 


length 




sum 


1630.96 


sum 


2048.59 


1788.89 


670.38 
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Table 11: Small. 



word 


freq./mln 


word 


freq./mln 


trait 


emphasis 


MajieHbKHii 


411.52 


KopoTKuii 'short in length' 


202.55 


length 


- 


'small, little' 




TOHKHH 'thin' 


144.58 


thickness 


- 


He6ojibmoH 


180.08 


MejiKHii 'shallow; fine' 


125.05 


depth; all dims. 


- 


'not large' 




ysKHii 'narrow' 


105.47 


width 


- 


Majibiii 


108.71 


HH3KHii 'low; short in height' 


78.23 


height 


- 


'lesser; too small' 




TecHbiii 'tight' 


33.18 


spaciousness 


- 






KpoxoTHbiii 'tiny' 


28.4 




+ 






Kpome^Hbiii 'tiny' 


24.67 


- 


+ 






HesHa^HTejibHbiii 'insignificant' 


20.69 


significance 


- 






HHT^TOJKHbiii 'very insignificant' 


19.71 


significance 


+ 






HeBejiHKHii 'not great' 


13.04 


significance 








MHHHaTiopHbiH 'miniature' 


5.26 




+ 






Herjiy6oKHH 'not deep' 


4.77 


depth 








HenmpoKHii 'not wide' 


3.86 


width 








MajiroceHbKHii 'small (superl.)' 


3.61 




+ 






MH3epHbra 'paltry' 


3.61 




+ 






MHKpocKonH^ecKHii 'microscopic' 


3.06 




+ 






MaxoHbKHii 'wee' 


2.02 




+ 






nefljiHHHbiii 'not long' 


1.78 


length 




sum 


700.31 


sum 


823.54 


752.91 


90.33 



adjectives Ma/ieuhKuu and He6oAbuiou. Indeed, if my<pjiu muau '«shoes are too small', 
this doesn't necessarily mean that the shoes are small, they still can be size 10. But 
they are necessarily narrow, short, or tight. This is why the adjective Majiuu is also 
placed in table [TT] as a head word, and not as a hyponym. 

This argument is based on intuitive judgement about acceptability of certain ex- 
pressions, which is not a very solid foundation (cf. [30]). To improve it, one would 
have to formulate strict criteria of intersection and inclusion for meanings, and then 
demonstrate that they are satisfied. This is generally beyond the scope of the present 
essay, but one example of a completely objectivised approach is given below for the 
word njioxou 'bad'. 

Verbs provide some good examples as well. See tables [12] [cKasanib 'say') and [T3l 
{dyMamb 'think') that do not require any comments. 

In two other verbs we encounter a complication of a new type: see tables [T4l 
{nodHUMambCM/pacmu 'rise/grow') and [TSl ( Kpunamb/ njiaKamb 'shout/cry'). The words 
nodnuMambCM 'rise, ascend' and pacmu 'grow, increase' have some common sub-meanings, 
such as yeeAUHueambCM 'increase in quantity or size' as well as distinct ones, such as 
esAemamb 'soar, take off' and pacmupsimbcsi 'widen, spread' respectively. For example, 
a temperature can both rise and grow (in Russian "leMnepaTypa pacTei" is much more 
common than in English "temperature grows"), these expressions being quite synony- 
mous and meaning the increase in temperature. On the other hand, an elevator can 
only rise, while a child can only grow (a child can rise up on the toes, but this is a 
completely different meaning, of course). Apparently, in every or almost every context. 
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Table 12: Say. 



word 


freq./inln 


word 


freq./mln 


CKasaTb 'say' 


3535.97 


cnpocHTb 'ask (a question)' 


934.32 






OTBeTHTb 'answer' 


503.46 






paccKaaaTb 'tell' 


248.58 






npoHsnecTH 'pronounce' 


178.98 






KpHKHyTb 'shout' 


155.97 






nonpoCHTb 'ask (for a favor)' 


154.62 






coo6iri;HTb 'inform' 


148.80 






npHKaaaTb 'command' 


107.18 






BCJieTb 'order' 


95.67 






sajjBHTb 'state' 


86.61 






BOCKJiHKHyTb 'exclaim' 


81.66 






nporoBopHTb 'utter' 


78.35 






BoapasHTb 'object' 


69.66 






npeflynpeflHTb 'caution' 


51.23 






npo6opMOTaTb 'mutter' 


49.52 






npomenTaTb 'whisper' 


39.79 






noo6em,aTb 'promise' 


33.42 






BOSMyTHTbCH 'say indignantly' 


27.61 






ocBCflOMHTbca 'inquire' 


26.32 






6ypKHyTb 'growl' 


25.52 






mennyTb 'whisper' 


24.79 






nomvTHTb 'joke' 


24.06 






no3flopoBaTbCH 'greet' 


22.34 






BbipasHTbCH 'curse' 


22.28 






nonpomaTbca 'say goodbye' 


20.51 






CKOMaiiri^oBaTb ' Command' 


19.59 






npoBopnaTb 'growl' 


18.79 






paBKiivTb 'bark out' 


17.14 






BbiroBopHTb 'utter' 


16.22 






npoKpHHaTb 'shout' 


12.67 






BbiCKasaTbCfl 'express' 


12.12 






npoB03rjiacHTb 'announce' 


11.94 






rapKHyTb 'bawl' 


10.04 






MO.iBHTb 'say (arch., poet.)' 


9.67 






npoMOJiBHTb 'say (arch., poet.)' 


6.18 






6pflKHyTb 'blurt' 


6.18 






npo.icncTaTb 'babble' 


5.08 






npoMHMJiHTb 'mumble' 


4.47 






CbflSBHTb 'say sarcastically' 


3.67 






BonpocHTb 'inquire' 


2.88 






BflKHyTb 'blather' 


2.51 






npe^ocTepenb 'warn' 


2.45 


sum 


3535.97 


sum 


3372.85 
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Table 13: Think. 



word 


fr6Q./mln 


word 


freQ./ mln 


flynaTb tnmk 


936.40 


CTHTaTb 'reckon' 








MeHTaTb 'dream' 


OO.tDi 






nojiaraTb 'believe' 


'70 A C 

7o.45 






npe^nojiaraTb 'presume' 


OU.OD 






paccyjK^aTb 'reason' 


38.20 






cooopajKaTb consider 


36.36 






pasMbinijiHTb 'reflect on' 


on 'TK 






BooopajKaTb imagme 


20.69 






MbicjiHTb 'concieve' 


19.4 i 






pasflyMbiBaTb 'ponder' 


io.oo 






iipMivH^DiDaib 1 eci\.uii 








o6flyMbiBaTb 'think over' 


11.14 






BHHKaTb 'fathom' 


7.53 






noMbiniJiflTb 'dream of 


3.80 






saMbiniJiHTb 'scheme' 


2.75 






MHHTb 'imagine' 


2.33 






B^yMbiBaTbCH 'pondcr' 


1.47 






KyMCKaTb 'think (low colloq.)' 


1.16 


sum 


936.40 


sum 


806.59 



the verb yeeAUHueambCM 'increase' can be replaced with either nodHUManibCM 'rise' or 
pacmu 'grow' (this is a statement about Russian verbs, not their approximate equiva- 
lents in English), which means that its meaning is a subset of the intersection of their 
meanings — see Fig. [8l 

It turns out that the net frequencies of hyponyms match the head word frequencies 
in both columns of table [HI This would even allow to quantify the degree of common- 
ality between the meanings of the two head words. Exactly the same behavior can be 
observed with words npuHamb 'shout' and njiaKamb 'cry'. 

Finally, consider two more adjectives, xopomuu 'good' (table [TBI) and uaoxou 'bad, 
poor' (table [TTI). Synonyms (or rather hyponyms) were collected from dictionaries. The 
former word doesn't cause any difficulties: the net frequency of hyponyms corresponds 
well with the head word frequency. However, with the adjective uaoxou 'bad' the 
situation is quite different. Note first of all that the four most frequent synonyms 
offered by the dictionaries [xydou 'skinny; leaky; bad', huskuu 'low, short; base, mean', 
demeeuu 'cheap, worthless', chcoakuu 'pitiful; wretched') are not included in the table, 
because each of them has a primary meaning that does not directly imply badness. 
Something or somebody can be cheap and good, skinny and good, etc. But even without 
them, the net frequency of hyponyms is significantly over the head word frequency. 

Notice though that the hyponyms can be roughly classified into two categories: 
those denoting more of an objective quality of an object, like CKeepnuu (cf. Eng. poor 
in its senses unrelated to pitying and lack of wealth), and those denoting more of a 
subjective feeling towards the object, like Mepanuu 'loathsome, vile'. The head word 
itself falls more in the former category. To demonstrate this, consider the expression 
UAOXOU eop 'a bad thief. Its meaning is 'one who is not good at the art of stealing'. 
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Figure 8: Rise and grow (cf. table [T4ll . 




Table 14: Rise and grow. Translations are very approximate. 



word 


freq./mln 


word 


freq./mln 


noflHHMaTbCfl 'rise' 


102.41 


pacTH 'grow' 


71.74 


yBejiHHHBaTbCH 'increase' 


21.24 


ybejiH^HBaTbCH 'increase' 


21.24 


BbipacTaTb 'grow' 


13.04 


BbipacTaTb 'grow' 


13.04 


B03pacTaTb 'grow' 


12.12 


BOspacTaTb 'grow' 


12.12 


npH6biBaTb 'rise, swell' 


12.12 


npH6biBaTb 'grow, swell' 


12.12 


BSJieTaTb 'soar up, take off' 


14.38 






B36HpaTbca 'climb' 


8.81 










pacmnpsTbCH 'spread, widen' 


8.32 


BcnjibiBaTb 'rise to the surface' 


6.92 






BSflbiMaTbca 'heave' 


5.51 






noflpacTaTb 'grow' 


4.77 


noflpacTaTb 'grow' 


4.77 


BOCxo^HTb 'rise, ascend' 


4.04 






BCxoflHTb 'rise, ascend' 


3.98 






BOSHOCHTbCH 'rise, tower' 


1.71 


B03HOCHTbCfl 'rise, tower' 


1.71 






BspocjieTb 'mature' 


2.94 






mnpHTbca 'expand, widen' 


2.69 






coBepmencTBOBaTbCfl 'improve' 


2.69 


B3BHBaTbCfl 'soar up, be hoisted' 


1.78 










yMHOJKaTbca 'multiply' 


1.16 


sum 


108.64 


sum 


82.8 
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Table 15: Shout and cry. Translations are very approximate. 



word 


freq./mln 


word 


freq./mln 


KpH^aTb 'shout' 


220.36 


njiaxaTb 'cry, weep' 


120.71 


opaTb 'yeir 


67.64 






inyMeTb 'make noise' 


44.62 






peseTb 'roar; cry' 


26.99 


pcBCTb 'roar; cry' 


26.99 






pbi/iaTb 'sob' 


18.30 


BbiTb 'wail' 


17.51 


BbiTb 'wail' 


17.51 


BHSJKaTb 'shriek' 


16.04 


BHSJKaTb 'shriek' 


16.04 


BonHTb 'bawl' 


15.98 










BCxjiHObiBaTb 'sob' 


12.06 


Ha^pbiBaTbCH 'bawl' 


6.86 










CKyjiHTb 'whine' 


4.84 


raji^eTb 'clamor' 


4.41 










nnmaTb 'squeak' 


4.28 


BcpemaTb 'chirp, squeal' 


3.67 






CKauflajiHTb 'brawl' 


3.67 






rojiocHTb 'wail' 


3.24 










xHbiKaTb 'whimper' 


2.20 


ropjiauHTb 'bawl' 


1.84 






roMOHHTb 'shout' 


1.35 






sum 


213.82 


sum 


102.22 



in contrast to MepsKuu eop 'vile thief = 'one whom I loath because he steals'. Hence, 
only the frequencies of the hyponyms from the first category (denoting quality) should 
sum up to the frequency of the head word. 

But it is quite difficult to actually classify the words into these two categories. The 
"subjective" words tend to evolve towards emphatic terms, and further migrate to the 
"objective" group or close to it. So we need a method that would allow to perform 
classification without relying on dubious judgements based on the linguistic intuition. 
To this end, notice that there exist three classes of nouns by their compatibility with 
the adjectives from table [171 Neutral nouns, like nozoda 'weather' can be equally easy 
found in noun phrases with both cKeepnuu 'bad, poor' and Mepsnuu 'R^disgusting'. 
However the nouns carrying distinct negative connotation, such as npedamejib 'traitor' 
are well compatible with Mepanuu 'ssdisgusting', but not with cKeepnuu 'bad, poor'. 
On the contrary, nouns with distinct positive connotation have the opposite preference: 
cf. CKeepnuu noam 'bad poet' and 1* Mepanuu noam 'disgusting poet'. It is possible to 
find out which of the adjectives in table [17] tend to apply preferentially to positive or 
negative nouns, by using an Internet search engine. 

We considered eight test nouns: negative zadocmh 'R^filth', dpsmb 's^trash', npedamejib 
'traitor', npedamejibcmeo 'treason' and positive sdopoebe 'health', epan 'doctor', noam 
'poet', anmep 'actor'. They were initially selected for maximum contrast in their com- 
patibility with adjectives cKeepnuu and MepsKuu. Then we used Russian-specific search 
engine Yandex (jhttp:/ /www. yandex.ru | to determine the frequencies of noun phrases 
constructed from each of the adjectives with each of the nouns. 

It should be noted here that search engines can't be directly used as replacements 
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Table 16: Good. 



word 


ireq. / mill 


word 


ireq./mm 


xoponiHH good 


853.71 


floopbiH good, kmd 


201.38 






npeKpacHbifl 'splendid, excellent' 


140. 1 1 






npiiaTiibiH 'nice' 








SjiecTHmHH 'brilliant' 








3aMeHaTejibHbiH 'remarkable' 


OA 
dU.o4 






1^ "till 
DJiaropo/IHbiH noble 


57.66 






OTJiHHHbiii 'excellent' 


4z.z4 






cjiasHbiii 'glorious, nice' 


38.44 






BejiHKOJieniibiH 'magnificent' 


o4.yo 






HyflecHbiH wondertul 


34.46 






pocKomubiii 'splendid' 


Z 1 .\)L 






HenjioxoH 'not bad' 


2b. 6a 






Hy^Hbiii 'wonderful' 


io. 1 1 






npeBOCxoflHbifl 'excellent' 


13.47 






npejiecTiibiii 'lovely, delightful' 


12. z4 






/J^MrSrlblM Cliai llllllg 








6jiaroH 'good' 


8.88 






6e3ynpe^HbiH 'impeccable' 


8.63 






o6pa3u;oBbiH 'exemplary' 


8.57 






roflHbiii 'suitable, valid' 


7.96 






nyTiibiH 'worthwhile' 


7.77 






OTMeHHbiii 'excellent' 


7.35 






HsyMHTejibHbifl 'marvellous' 


6.79 






BOCXHTHTejibHbiii 'adorablc' 


6.55 






npnroflHbiii 'suitable' 


6.49 






floSpocoBecTHbifl 'conscientious' 


4.53 






yflOBJieTBopHTC.ibiibiH 'satisfactory' 


3.86 






flo6poKaHecTBeHHbiii 'of good quality' 


3.31 






6jiaroycTpoeHHbra 'well-furnished' 


2.08 






noxBajTbiibiH 'laudable' 


1.96 






6ecnoflo6Hbiii 'incomparable' 


1.78 


sum 


853.71 


sum 


938.79 
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Table 17: Bad. Some translations are very approximate. 



word 


freq./mln 


word 


freq./mln 


weight 


quality? 


njioxoii 'bad, poor' 


102.22 


AypHoii 'bad, mean' 


40.40 


0.911 


+ 






npoTHBHbifl 'repugnant' 


28.34 


-0.0584 








OTBpaTHTejibHbiii 'disgusting' 


21.85 


-0.439 








HexoponiHii 'not good' 


20.14 


0.914 


+ 






MepsKHH 'vile' 


13.22 


-1.946 








CKBepHbiii 'bad, poor' 


13.16 


0.896 


+ 






rHycHbiii 'abominable' 


12.73 


-3.160 








noranbiii 'foul' 


11.51 


-0.330 








napniHBbra 'nasty' 


10.16 


-0.407 








KoniMapHbiii 'nightmarish' 


9.30 


-0.180 








HeraTHBHbiii 'negative' 


7.10 


-0.183 








HeBajKHbiii 'rather bad' 


6.00 


1.200 


+ 






OMepsHTejibHbifl 'disgusting' 


6.00 


-0.432 








ra^KHii 'repulsive; nasty' 


5.33 


-0.490 








xpenoBbiii 'bad, poor (colloq.)' 


5.14 


2.358 


+ 






HHKHeMHbiii 'worthless' 


5.08 


0.144 


+ 






i-iero/iHbiH 'worthless' 


4.10 


0.157 


+ 






flpaHHoii 'rotten, trashy' 


3.92 


-0.110 








HHKyflbiuiHbiii 'worthless' 


3.37 


3.095 


+ 






saxy/iajiMH 'run-down' 


2.57 


0.347 


+ 






HenpnrjiHflHbra 'unsightly' 


2.39 











HesaBH^Hbiii 'unenviable' 


1.90 


-0.161 








7;epbM0Bbm 'shitty' 


1.90 


0.161 


+ 






(janroBbiii 'bad, poor (colloq.)' 


1.78 


0.545 


+ 






neyflOB jieTBopHTejibHbiii ' unsatisfactory ' 


1.65 


-0.077 








nacKy/IHbiH 'foul, filthy' 


1.59 


-0.203 








OTBpaTHbiii 'disgusting' 


1.41 


-0.165 








rpomoBbiii 'dirt-cheap' 


1.35 


-0.172 








6pocoBbiH 'worthless, trashy' 


1.35 










naKOCTHbiii 'foul, mean' 


1.35 


-0.234 








o^HOSHbifl 'odious' 


1.35 


-0.122 








CBOJiOHHoii 'mean, vile' 


1.04 


-0.318 








axoBbiii 'rotten' 





-0.109 








fle(i)eKTHbiH 'defective' 





-0.179 








saBajiflHiHH 'worthless' 





-0.078 








MepsocTHbiii 'disgusting' 





-0.270 








MepsonaKOCTHbifl 'disgusting' 





-0.302 








iiH3Konpo6nbiH 'low-grade' 





-0.406 








OTTajiKHBaiomHii 'revolting' 





-0.198 




sum 


102.22 


248.48 




103.64 



42 



for a frequency dictionary. First, they typically report the number of "pages" and 
"sites", but not the number of word instances. Meanwhile, web pages can be of very 
different size, and may contain multiple instances of a word or search phrase. Second, 
search engines trim the results to exclude "similar pages" and avoid duplicates, i.e. texts 
available in multiple copies or from multiple addresses. It's not clear whether this is 
correct behavior from the point of view of calculating frequencies. Finally, the corpus 
with which search engines work, the whole of the Web, is by no means well-balanced 
according to the criteria of frequency dictionary compilers. So the results from search 
engines can't be directly compared with the data from frequency dictionaries. But for 
our purposes we need only relative figures, and we are interested in their qualitative 
behavior only. The effect we are looking for, if it exists, should be robust enough to 
withstand the inevitable distortion. 

The frequencies of noun phrases constructed from each of the adjectives Oj with 
each of the test nouns rij form a matrix Nij presented in table [181 One can readily 
see that the rows "mcpskhh" and "cKBepHtm" clearly separate the test nouns into two 
groups preferentially compatible with one or the other. Many other rows of the table 
(e.g. "rHycHbift" and "HeBajKHbift") behave in the same way. But there are rows that do 
not, and that is precisely the reason to consider multiple test words. Thus the adjective 
Hesodnuu '~worthless' is well compatible with all the positive test nouns, but also with 
the negative test noun dpsmb '~trash'. The adjectives HenpuzjindHuu 'unsightly' and 
6pocoeuu 'worthless', as it turns out, are not compatible with any of them, so they 
are excluded from further analysis. Their low frequency can't appreciably change the 
result anyway. 

To recap, we want to classify the rows of table [18] by whether each row is more 
similar to the row "cKBepHbiii" (quality of the object) or to the row "mcpskhh" (speaker's 
attitude towards the object). This can be done via statistical procedure known as 
principal component analysis or method of empirical eigenfunctions. 

First, each row of table [TSl was normalized by subtracting the average and dividing 
by the standard deviation. This makes the rows "mcpskhh" and "cKBepHbin" roughly 
opposite to each other: positive on positive test nouns and negative on negative ones, 
or vice versa. Then, correlation matrix of the table's columns was calculated (size 8x8) 
and its first eigenvector nj. Finally, the eigenvector's scalar products with i-th row of 
the table yields the weight of the corresponding adjective aj = J2j 

Mathematically, the result of this procedure is that the product ajnj provides the 
best (in terms of mean square) approximation of this kind to the matrix Nij. In other 
words, each row of the normalized table[T8]is approximately proportional to the pattern 
row nj multiplied by the weight a\. The pattern row is given at the bottom of table 
[T8l As expected, it correctly classifies test nouns as positive and negative. This means 
that they actually behave in opposite ways relative to the adjectives of interest. Now 
we can classify all the adjectives with positive weights a\ > as proper hyponyms of 
the word njioxou 'bad, poor'. The weights are shown in table [T7] (in an arbitrarily 
normalization). The table shows that the net frequency of these proper hyponyms is 
very close to the frequency of the head word. 

So it can be seen that the frequency hypothesis is confirmed here as well, and this 
conclusion is not based on any intuitive judgement about word semantics. 
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Table 18: Compatibility of the hyponyms of njioxou 'bad, poor' with test nouns on the Web 
("the number of pages"). 
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We conclude with a brief discussion of some encountered counterexamples. In con- 
trast to the words depeeo 'tree', v^eemoK 'flower', Hzoda 'berry', and pu6a 'fish', the 
words Dfcueomnoe 'animal' and, to a lesser extent, nmuii,a 'bird' are significantly less 
frequent than predicted by the net frequency of their hyponyms. The reason is probably 
that some of the most frequent animal and bird names have very wide connotations, far 
beyond the notion of 'this or that animal/bird'; e.g. oceA 'donkey, ass' and opeA 'eagle' 
(apparently, a much less loaded word in English than in Russian, where it readily stands 
for power, grandeur, nobility, both straight and ironic). It is not surpirising then, that 
the frequency of such words is much greater than had they denoted strictly the corre- 
sponding animals. (See also the discussion of the words co6aKa 'dog' and Aomadb 'horse' 
in Section 2). Among tree and flower names, only a small number are like that, and to 
a much smaller degree, e.g. dy6 'oak' (its Russian figurative meaning as 'a dumb, in- 
sensitive person' doesn't seem to have a counterpart in English) and posa 'rose' (which 
doesn't have any fixed dictionary senses other than the flower, but has an established 
tradition of metaphoric usage). It is possible, at least in principle, to quantify the last 
statement by analyzing the actual word usage, and then counterexamples could turn 
into confirming evidence. 

Interesting counterexamples are provided by words cmpaua 'country, state', zopod 
'city, town', pena 'river, creek', and oaepo 'lake'. The net frequency of the nouns cmpaua 
'country, state', zocydapcmeo 'state, nation', pecny6AUKa 'republic', and KopoAeecmeo 
'kingdom' is 705.39 per mln. The net frequency of all the countries of the world found 
in the dictionary (except the former Soviet republics) is 1206.05, which is about 70% 
too much. However the first word in the list, Poccum 'Russia', is four times as frequent 
as the number two {FepMaHun 'Germany'). Its frequency is 358.88 per mln and is 
responsible for most of the discrepancy. Of course, Russia for Russian speakers is much 
more than just another country. Most of the rest of the discrepancy can be attributed 
to the fact that the word AMepuKa 'America' denotes two continents and a part of 
world, in addition to the country. 

A very similar is the situation with the word zopod 'city, town'. Its frequency is 
630.59 per mln, while the net frequency of all city names we could find in the dictionary 
is 1087.18. But here again, Mocnea 'Moscow' (frequency 420.89, 5-6 times more than 
the next city name) is responsible for the whole discrepancy. "MocKsa... Kax mhofo b 

3T0M SByKC'O. 

On the other hand, the net frequency of all the river names in the dictionary is 
somewhat less than the frequency of the word pena 'river' (187.61 vs. 199.36), and 
that despite the fact that don, VpaA, and Ajnyp are not just river names (a Spanish 
nobleman title, the Ural mountains, and 'Cupid; love affair' respectively). This same 
effect is much more pronounced with the word oaepo 'lake': its frequency is 74.496 while 
the net frequency of all the lake names in the dictionary is only 21.72. Most probably, 
this is because only five lake names made it to the dictionary: EauKaA 'Baikal', JIadoza 
'Ladoga', Oneza 'Onega', BunmopuM 'Victoria' (some instances are, probably, personal 
names), Hccun-KyAb 'Issyk-Kul'. Most lake names either fall below the 1 per mln 
threshold, or are homonymous with common names or adjectives. The same is true to 
a lesser degree for river names. 

^"Moscow... how much the sound embraces", from Pushkin's Eugene Onegin 
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To summarize, we demonstrated on several examples that the hypothesis of word 
frequency being proportional to the extent of its meaning is supported by available data, 
while counterexamples are few and tend to have plausible explanations. Of course, 
a much more thorough and systematic investigation is in order until the hypothesis 
can be considered proven. We only sketched some promising approaches to such an 
investigation. But it also should be noted that the examples considered span a wide 
range of word frequencies, include all three main parts of speech, and involve very 
common words, not specially hand-picked ones. 
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